Part 315 min read

Ports: How Data Finds Your Application

Layer 4: how a 16-bit number on top of an IP address routes a packet to the right process on a busy machine.

What you will learn

  • Describe the four-tuple that uniquely identifies a TCP connection
  • List well-known ports for the protocols you use every day
  • Explain what an application is actually asking the kernel to do when it "listens on port 3000"
  • Connect Unix's "everything is a file" idea to network sockets

Ports: How Data Finds Your Application

You have an IP address. The packet has arrived at your machine. Now what? Your laptop is probably running a web browser, a chat client, a video player, a code editor, a few background services, and several dozen system processes. Which one of them is supposed to handle this packet?

That is what ports are for. Layer 4. The transport layer.

A port is a 16-bit number — a value from 0 to 65535 — that identifies a specific application endpoint on a given IP address. Together, an IP address and a port form a socket. A connection between two machines is uniquely identified by a four-tuple: source IP + source port + destination IP + destination port.

When your browser opens a connection to google.com, what is actually happening is:

  • Your browser asks the operating system for an unused local port — say, 54321.
  • It opens a TCP connection from your_local_ip:54321 to google_ip:443.
  • Every packet for that connection carries that four-tuple.
  • The operating system can have many such connections open simultaneously, even to the same server, distinguished only by the source port.

Well-known port numbers.

The internet has standardized a handful of port numbers for common services. You should memorize at least these:

  • 20, 21 — FTP (data and control)
  • 22 — SSH
  • 25 — SMTP (sending email)
  • 53 — DNS
  • 80 — HTTP
  • 110 — POP3 (receiving email, old)
  • 143 — IMAP (receiving email, modern)
  • 443 — HTTPS
  • 3306 — MySQL
  • 5432 — PostgreSQL
  • 6379 — Redis
  • 27017 — MongoDB

These are conventions, not laws. You can run a web server on port 8080 (and many people do, because ports below 1024 require elevated privileges on Unix-like systems). You can run SSH on port 2222 to keep noisy automated scanners away. The conventions exist so that, in the absence of other information, clients know where to look.

What does a port actually mean inside the machine?

When an application "listens on port 3000," what is the operating system actually doing?

It is opening a socket — a kernel-managed data structure — that says "any incoming TCP traffic on port 3000 should be delivered to this process." Until that socket is opened, packets arriving for port 3000 are rejected with a "port unreachable" error. Once the socket is open, the OS routes matching packets to the process via system calls (accept, read, write, recv, send).

If two processes try to listen on the same port on the same IP, the second one fails with an "address already in use" error. This is why, when your local dev server refuses to start, the message is usually some variation of "port 3000 is already in use." Somebody else got there first.

Ports above 49152 are typically used as ephemeral source ports for outbound connections. Ports below 1024 are typically reserved for services and require root or administrative privileges to bind to. The middle range (1024-49151) is used for everything else.

The 64K trap.

The port number is 16 bits, so there are at most 65,536 possible values per IP address. Sixty-four kilobytes, the same number you remember from old textbooks. This is the budget for every conversation a host can simultaneously open with the same remote IP and port.

From the client side, your operating system picks an ephemeral source port — usually in the 32,000 to 64,000 range — for each outbound connection. Each unique conversation is identified by the four-tuple (source IP, source port, destination IP, destination port). The destination IP and destination port are fixed when you are connecting to one server on one well-known port. The source IP is fixed by your interface. The only thing left to vary is the source port. So you have, in theory, about 32,000 simultaneous outbound connections you can open from one client to one server-port pair.

This leads to a question worth chewing on. Someone says, "I have a server with an IP address, and I am handling a hundred thousand concurrent TCP connections on it." Is that true or false?

The honest answer is: it depends on what you mean. The 64K limit applies per source — every distinct remote client gives you a fresh space of 65,536 possible source ports. So a server receiving connections from a hundred thousand different clients can absolutely have a hundred thousand sockets open, because each client has its own port space. The limit you hit is RAM, CPU, file descriptors, kernel buffers, and (eventually) the actual port pool used by your own outbound connections if your server is also acting as a client to a downstream service.

Now flip it. One client, talking to one (IP, port) pair on the server. You cannot have more than ~32,000 concurrent connections from that single client. This is why connection pools, multiplexing protocols (HTTP/2, gRPC), and persistent reuse exist. The 64K trap is real, but it is on the client side of the four-tuple, not the server's listening socket.

Senior engineers carry this distinction in their head. If a teammate confidently asserts that "a server can only have 65,536 open connections," they have flattened a per-source limit into a global one. Go check.

Sockets, file descriptors, and the way Unix sees the network.

In Unix and Unix-like systems (Linux, macOS), a socket is a file descriptor — the same abstraction the kernel uses for open files. Reading from a socket is the same system call as reading from a file. Writing to it is the same system call as writing to a file.

This is one of the more elegant ideas in the history of operating systems: networks, files, devices, and pipes are all unified under one abstraction. A program that knows how to read and write to a file already knows, in a sense, how to read and write to the network. The kernel handles the difference between "data from disk" and "data from the network" beneath the abstraction.

We are touching on something here that will become important in the next chapter: when data arrives on the network card, it does not magically appear inside your Node.js process. There is an entire journey from "electrical pulses on a copper wire" to "string in JavaScript." That journey is the bridge between hardware and software. It is also one of the most under-taught topics in modern engineering education, and it is where most of the truly important performance work happens.

Take a breath. The next chapter is where it gets interesting.

Push On It

  1. On your machine, list every port currently in use. (netstat -an or ss -tulnp on Linux, lsof -i -P -n on macOS, netstat -ano on Windows.) Identify what process owns each one. You will be surprised how many things on your machine are listening.
  2. Read about SO_REUSEADDR and SO_REUSEPORT and figure out the practical difference. When would you use each?
  3. Write a tiny TCP echo server in your language of choice. Make it listen on a port. Connect to it with telnet or nc. Now do the same with UDP. Notice what is different between the two.
  4. Look up the difference between the server limit on concurrent TCP sockets and the client-side 64K source-port limit. Find one place in your stack where the client-side limit might bite you (hint: outbound connection pools to a single downstream service).

Find Everything Listening

List every port currently in use on your machine. (`ss -tulnp` on Linux, `lsof -i -P -n` on macOS, `netstat -ano` on Windows.) Identify the process behind each one. You will be surprised how many things are listening.

Flashcards (6)

How many possible port numbers are there, and how big is a port?

What is the four-tuple that uniquely identifies a TCP connection?

Match these ports to their protocols: 22, 53, 80, 443, 5432, 6379.

+3 more flashcards

Ports: How Data Finds Your Application | Junior2Senior.dev | Junior2Senior.dev