From Wire to RAM: Buffers, Interrupts, and System Calls

A packet has arrived at your network card. It has passed through copper and switches and routers and gateways. It has been parsed by Layer 2 and Layer 3 and Layer 4. It is now sitting at the input of a small chip on your motherboard, and it needs to become something your program can read.

This is the chapter where we walk that last mile.

Why is RAM in the picture at all?

You might ask: why doesn't the network card just hand the packet straight to the CPU? Why does the data go to memory first?

The answer is mismatched speeds and decoupled responsibilities. Your network card might be operating at 1 Gbps. Your CPU might be running at several GHz. Your RAM is somewhere in between in terms of effective throughput. None of them are sitting around waiting for each other; they all do their own things, on their own clocks, in parallel.

So instead of forcing the CPU to receive each packet directly, the network card writes the packet into a region of RAM that has been pre-arranged for it — a buffer. The CPU is not required to be paying attention at the moment the packet arrives. When the CPU is ready, it reads the buffer and processes it.

Buffer vs. spool.

You will hear two related words. They are not the same thing.

A buffer is a region of memory that temporarily holds data flowing in or out. When data comes in faster than it can be processed, the buffer absorbs the difference. When data goes out and the downstream system is slow, the buffer absorbs the difference in the other direction. A buffer's job is to smooth out speed mismatches.

A spool is a queue of jobs to be processed in order, often on a slower device. The classic example is a print spool — many programs send print jobs to one printer, the spooler queues them up, and the printer processes them one at a time. A spool's job is to serialize work for a constrained resource.

Both are queues. The difference is in the intent. A buffer is about smoothing throughput; a spool is about serializing access.

Shared memory and the way devices coexist.

Inside your computer, there is one pool of physical RAM. The operating system carves it up. Some of it is allocated to your applications. Some of it is allocated to the kernel. Some of it is shared with hardware devices through a mechanism called Memory-Mapped I/O (MMIO).

In MMIO, certain regions of the physical address space are not backed by ordinary RAM but by registers on devices — the network card, the graphics card, the storage controller, and so on. When the CPU reads from those addresses, it is reading hardware registers. When it writes to them, it is writing to hardware. This is how the CPU configures and communicates with peripherals.

Your graphics card is the most dramatic example. The frame buffer — the actual pixel data that gets rendered to your monitor — lives in a memory region that both the GPU and (with the right permissions) the CPU can access. When you draw a pixel on screen, what is actually happening is: somebody wrote a value to a memory address that the graphics hardware is constantly reading and refreshing onto the screen at, say, 60 or 120 times per second.

This is also why malware that gets raw access to physical memory can do extraordinary damage. Direct memory writes can bypass nearly every software defense, modify other processes, exfiltrate data, or rewrite kernel code. Modern operating systems and CPUs have many protections against this (IOMMUs, memory protection bits, SMEP/SMAP, virtualization extensions), but the principle remains: whoever controls the memory controls the machine.

Interrupts — how the CPU finds out something happened.

How does the CPU know that a packet has arrived in the buffer?

The naive answer is "it checks." This is called polling, and it is awful — the CPU spends most of its time asking "anything yet? Anything yet? Anything yet?" The good answer is interrupts.

An interrupt is a hardware signal sent from a device to the CPU. When the network card finishes writing a packet into the buffer, it raises an interrupt request (IRQ). The CPU stops whatever it was doing, saves its state, and jumps to an interrupt service routine — a small piece of kernel code dedicated to handling this kind of event.

The interrupt service routine does the minimum necessary work — usually marking that data is available — and hands the rest of the processing off to a slower-priority kernel path. Then the CPU restores the state it had before and resumes the original work.

This model is one of the foundational ideas of every modern computer. The CPU is not constantly checking on hardware. The hardware tells the CPU when something has happened. The CPU spends its time doing real work, and only gets pulled away briefly when needed.

The system call boundary.

Your application code does not get to read raw memory buffers from the network card. There are two reasons. First, security — if every process could read raw network buffers, no privacy or isolation would exist between processes. Second, abstraction — applications should not need to know whether the data came from Ethernet, Wi-Fi, fiber, or a USB tether. The kernel hides all of that.

The boundary between userland code and kernel code is the system call. When your Node.js or Python or Go program reads from a socket, what is actually happening is a system call into the kernel. The kernel checks the socket's receive buffer. If data is there, it copies the data into your program's memory and returns. If no data is there, depending on the mode, it either blocks until data arrives or returns immediately with an "would block" indication.

Every read, every write, every accept, every connect is a syscall. Syscalls are not free — they involve a context switch from userland to kernel mode and back. High-performance networking systems are heavily designed around minimizing the cost of these transitions: techniques like epoll, kqueue, io_uring, zero-copy networking, and DPDK are all ways of doing more work per syscall, or avoiding syscalls altogether by mapping the network buffer directly into the application's memory.

This is also where the modern asynchronous-everything style of programming comes from. The "event loop" inside Node.js, the async/await in Python and Rust, the goroutine scheduler in Go — they are all sitting on top of the same underlying kernel mechanisms for being told when sockets are ready, without having to keep one thread blocked per connection.

The crystal that keeps the music in time.

We have been talking about clocks. The CPU runs at billions of cycles per second. The RAM operates on its own clock. The network card has its clock. The PCI bus has its clock. Every chip in the machine is dancing to some heartbeat.

Where does that heartbeat actually come from?

It comes from a tiny piece of quartz crystal soldered onto your motherboard (and several smaller ones distributed across the chips). A quartz crystal, when supplied with a small electric current, vibrates at a precise and stable frequency. The crystal's vibration is converted into an electrical pulse train, and that pulse train becomes the master clock of the system. Frequency multipliers and dividers turn that base clock into the many derived clocks that the various chips run on.

This is, take a moment, kind of incredible. The reason your CPU can execute instructions in a coherent order, the reason your screen refreshes evenly, the reason packets arrive at the right rate — all of it depends, ultimately, on a chemically engineered piece of crystal vibrating at a known frequency.

A chemical engineer made that crystal. A physicist worked out why it vibrates so reliably. A materials engineer figured out how to produce it at scale. An electrical engineer figured out how to drive it. A computer architect figured out how to use it. The little black box on your desk is the joint output of all of those people, none of whom are typically in a computer science curriculum.

This is the deeper lesson of this chapter: modern software is sitting on a stack of physical, electrical, chemical, and mechanical engineering that almost nobody on the software side ever sees. The senior engineer is the one who has at least glanced at each of these layers, and who therefore has a more honest picture of where the limits of their system come from.

Push On It

Read about the kernel's "socket receive buffer." What happens to incoming data if the application is too slow to read it? How does the kernel apply backpressure? What does this have to do with TCP's window size?
Look up epoll (Linux), kqueue (BSD/macOS), and IOCP (Windows). What problem do they all solve? How is the approach different from the older select and poll syscalls?
On Linux, run cat /proc/interrupts. You will see a giant table of interrupt counts per CPU per device. Find the row for your network card. Watch how the numbers change while you load a heavy website.

From Wire to RAM: Buffers, Interrupts, and System Calls

What you will learn

From Wire to RAM: Buffers, Interrupts, and System Calls

Push On It

Watch the Interrupts

Flashcards (6)