Part 320 min read

Finding the Real Bottleneck

Your 1 Gbps link is running at 80 Mbps. The cable is fine. The wire is fine. The ISP is fine. Now what?

What you will learn

  • Enumerate every stage in an end-to-end connection that could be the slowest link
  • Reason about CPU and RAM costs when pushing real data over a network
  • Carry a list of measurement tools, one per layer, for the systems you own
  • Resist the urge to upgrade the part you happen to be looking at

Finding the Real Bottleneck

You have a 1 Gbps internet connection. You ordered the top tier. You run a speed test, and it shows 80 Mbps. You complain to the ISP. The ISP sends a technician. The technician looks at your laptop, looks at you, and asks, "What CPU is this?"

You did not see that coming. You came in expecting a problem with the cable, or the modem, or the connection to the wall. Instead the technician wants to know about your processor.

He is right. Let me explain why.

Where the bottleneck actually lives.

A network connection is not a single thing. It is a chain. Data has to flow through:

  1. The remote server's CPU and RAM, generating the data.
  2. The remote server's network stack, packaging the data.
  3. The remote server's network card and link to its switch.
  4. The remote server's data center upstream link.
  5. The wider internet — undersea cables, regional aggregators, peering points, your ISP.
  6. Your ISP's last-mile link to your home.
  7. Your modem.
  8. Your home router.
  9. Your local Wi-Fi or Ethernet link.
  10. Your laptop's network card.
  11. Your laptop's RAM and CPU, processing the incoming data.
  12. Your application, finally consuming it.

The end-to-end throughput is limited by the slowest link in this chain. If any one of those stages cannot keep up, the entire connection runs at that stage's speed, regardless of how fast everything else is.

When the ISP sells you "1 Gbps internet," they are telling you about the link between your modem and their network. That is item 6 on the list. They are not telling you anything about item 11.

Why the CPU matters for raw throughput.

Pushing 1 Gbps of TCP traffic into and out of a machine is not free. The CPU has to:

  • Receive interrupts from the NIC at a high rate.
  • Walk the kernel TCP/IP stack for every packet.
  • Reassemble fragmented packets, check sequence numbers, send ACKs.
  • Decrypt every byte if the connection is TLS-encrypted (HTTPS is almost everything now).
  • Copy data from the kernel's receive buffer into the application's memory.
  • Run whatever your application does with the data after that.

On a modern x86 server-class chip, this is easy. On a years-old low-end laptop, it can be the bottleneck. On a Raspberry Pi or a phone or an embedded device, it absolutely is.

If your CPU cannot pull data out of the receive buffer fast enough, the buffer fills, the kernel applies backpressure via TCP's flow control window, and the remote sender slows down. The 1 Gbps link sits idle for most of the time, even though the line itself is perfectly capable.

The same logic applies to RAM bandwidth. If you have a slow memory subsystem — say, low-clocked single-channel DDR — you can be CPU-bound on memory copies between kernel and userland. Real-world high-throughput networking systems care a great deal about NUMA topology, memory channel layout, and cache behavior, because these things are the actual ceiling, not the network itself.

This is why the ISP technician's question is not stupid. He has seen this play out a thousand times. Someone orders a fast plan, runs it on a weak machine, and blames the wire.

The same pattern, everywhere.

Once you see this shape — the slowest link sets the speed — you start seeing it everywhere.

  • A video call freezes. Is it the network? The camera driver? The codec on the CPU? The renderer on the GPU? The downstream peer's machine? It can be any of them.
  • A database query is slow. Is it the query plan? The disk? The buffer cache? The network between the app and the DB? The client deserialization?
  • A website is sluggish. Is it the server? The backend service it calls? The DB? The CDN? The user's browser? The user's CPU rendering an unoptimized React tree?

In every case, the wrong move is to upgrade the part you happen to be looking at. The right move is to measure and find where the actual constraint lives. Senior engineers carry, in their head, a list of layers that could be the bottleneck for the system they own, and they have, for each layer, at least one way to measure it.

Streaming, codecs, and the impossible math.

Take a video call. The full chain is roughly:

  • Camera captures raw frames into a memory buffer.
  • A codec on the CPU (or, ideally, the GPU's video encoder block) compresses each frame.
  • The compressed stream is chopped into packets and pushed onto the network.
  • Through Wi-Fi, ISP, internet, ISP, Wi-Fi, the packets arrive at the other side.
  • The remote machine reassembles the stream, decodes it on its CPU/GPU, and writes it to a memory buffer that the display reads from.

All of this happens, every frame, in tens of milliseconds. A 30 fps video call has a budget of about 33 milliseconds per frame. That budget has to be split across capture, encode, transmit, transmit-distance, decode, and render. Across the planet.

This is, when you think about it, extraordinary. We are doing this routinely on consumer hardware. We are also, almost always, just barely doing it. The slightest hiccup in any layer is immediately visible as a glitch.

And our brain still does it faster.

The human visual system, the human reflex arc, the way we process speech and motion in real time, runs on biology that has not had the benefit of clock-rate increases. We are, by any honest measurement, the slowest link in a video call. We do not notice, because the engineering around us has worked so hard to stay below our threshold.

That is what good engineering looks like. Not "as fast as possible." As fast as needed, with everything else optimized to not get in the way.

Push On It

  1. Pick a tool you use that feels slow. Identify, on paper, every component in its chain. Then guess which one is the bottleneck. Then measure. How often were you right?
  2. Find an old laptop (or borrow one). Try to saturate a high-speed link with it. Where does it run out of steam first? CPU? RAM? Disk write? Wi-Fi card?
  3. Look up the relationship between TCP window size, bandwidth-delay product, and throughput. Calculate the minimum TCP window required to saturate a 1 Gbps link with 100 ms round-trip latency. (The result will surprise you and explain a lot about why long-distance high-throughput transfers are hard.)

Find the Real Bottleneck

Pick a tool you use that feels slow. Identify, on paper, every component in its chain. Then guess which one is the bottleneck. Then measure. How often were you right?

Flashcards (5)

What determines the end-to-end throughput of a network connection?

Why does the CPU matter for raw network throughput?

What's wrong with "upgrade the part I'm looking at" as a debugging strategy?

+2 more flashcards

Finding the Real Bottleneck | Junior2Senior.dev | Junior2Senior.dev