The Memory Hierarchy

Your computer doesn't just have "memory." It has a carefully designed hierarchy of storage, each level trading speed for capacity. Understanding this hierarchy is fundamental to writing fast software.

The Problem: Speed vs. Capacity

Here's a fundamental truth of computer engineering: fast storage is small, large storage is slow. This isn't a limitation we'll eventually overcome - it's physics. The faster you can access data, the more expensive and power-hungry the storage, and the harder it is to make it large.

This creates a dilemma. We want our programs to access lots of data quickly. But we can't afford to make all storage as fast as the fastest type.

The solution? A hierarchy.

Picture a pyramid. At the top, a tiny amount of blindingly fast storage. At the bottom, vast amounts of slow storage. Data flows up and down this pyramid as needed.

Level	Type	Typical Size	Typical Latency	Analogy
1	CPU Registers	~1 KB	<1 ns	Your hands
2	L1 Cache	32-64 KB	~1 ns	Your desk
3	L2 Cache	256 KB - 1 MB	~4 ns	Your drawer
4	L3 Cache	8-32 MB	~10 ns	Your filing cabinet
5	RAM	8-128 GB	~100 ns	Your office
6	SSD	256 GB - 4 TB	~100 us	The library down the street
7	HDD	1-20 TB	~10 ms	A warehouse across town

Look at those latency numbers. L1 cache is 100x faster than RAM. RAM is 1000x faster than an SSD. When your program waits for data from a slow level, it's not doing useful work.

How Caches Work

Caches operate on a simple principle: keep recently used data close by, because you'll probably need it again.

When your CPU needs data, it checks in order:

Is it in L1 cache? (fastest check)
Is it in L2 cache?
Is it in L3 cache?
Is it in RAM?
Is it on disk? (slowest, might need to load from SSD/HDD)

If data is found at a level, that's a cache hit. If not, it's a cache miss, and the CPU must wait while data is fetched from a lower level.

CPU needs data at address X

Check L1: Miss (data not there)
    └── Check L2: Miss
            └── Check L3: Miss
                    └── Check RAM: Hit!
                            └── Load data
                            └── Copy to L3, L2, L1 for next time
                            └── Return to CPU

Notice that when data is fetched, it's copied to all the cache levels above. This way, if you need it again soon, it'll be right there waiting.

Cache Lines: Data Comes in Chunks

Here's a crucial detail: caches don't fetch individual bytes. They fetch cache lines - typically 64 bytes at a time.

Why does this matter? If you read one byte from memory, you actually get 64 bytes loaded into cache. If the next byte you need is adjacent (within that same 64-byte chunk), it's already in cache - free!

This is why accessing data sequentially is so much faster than jumping around randomly.

Locality: The Key to Cache Performance

Two principles determine whether your code plays nicely with caches:

Temporal Locality: If you accessed data recently, you'll likely access it again soon.

// Good temporal locality - sum is accessed repeatedly
let sum = 0;
for (let i = 0; i < 1000; i++) {
    sum += array[i];  // 'sum' stays in cache
}

Spatial Locality: If you accessed data at location X, you'll likely access data near X soon.

// Good spatial locality - array elements are adjacent
for (let i = 0; i < array.length; i++) {
    process(array[i]);  // Sequential access, cache-friendly
}

// Poor spatial locality - jumping around
for (let i = 0; i < array.length; i++) {
    process(array[randomIndex()]);  // Random access, cache-hostile
}

The Hidden Cost of Cache Misses

Let's make this concrete. Imagine a loop that processes a million items:

for (let i = 0; i < 1_000_000; i++) {
    result += data[i];
}

If each access is a cache hit (data in L1): ~1 million nanoseconds = 1 millisecond.

If each access is a cache miss (data must come from RAM): ~100 million nanoseconds = 100 milliseconds.

Same code, 100x slower, just because data wasn't in cache.

Practical Example: 2D Array Traversal

Consider a 2D array (common in image processing, games, scientific computing):

// Array stored in row-major order:
// [[a,b,c], [d,e,f], [g,h,i]]
// In memory: [a,b,c,d,e,f,g,h,i]

// Row-by-row access (FAST)
for (let row = 0; row < height; row++) {
    for (let col = 0; col < width; col++) {
        process(array[row][col]);
    }
}

// Column-by-column access (SLOW)
for (let col = 0; col < width; col++) {
    for (let row = 0; row < height; row++) {
        process(array[row][col]);
    }
}

Both loops visit every element exactly once. But row-by-row traversal accesses memory sequentially, hitting cache lines in order. Column-by-column jumps around, potentially missing cache on every access.

In practice, this can mean a 10-50x performance difference for large arrays.

Beyond Code: System-Level Implications

This hierarchy affects more than just loops:

Databases keep frequently-accessed data in memory (buffer pool), less-frequently-accessed data on SSD.

Operating systems use RAM as a cache for disk (page cache). When you read a file twice, the second read is much faster.

Web browsers cache resources locally so they don't need to fetch from network (the slowest "level" of all).

CDNs are essentially a cache layer that moves data geographically closer to users.

The pattern repeats at every scale: keep hot data close, accept the latency penalty for cold data.

Key Takeaways

Memory has levels: Fast & small at the top, slow & large at the bottom.
Caches hide latency: They automatically keep recently-used data close to the CPU.
Locality matters: Access data sequentially when possible. Reuse data while it's hot.
Cache misses are expensive: A single RAM access costs as much as ~100 cache hits.
Think in cache lines: Data comes in 64-byte chunks. Sequential access is your friend.

When you write code, you're not just telling the computer what to compute - you're implicitly telling it how to move data through this hierarchy. Senior engineers write code that works with the hierarchy, not against it.

The Memory Hierarchy

What you will learn

The Problem: Speed vs. Capacity