DNS, the Quiet Backbone

Type a URL into a browser. Press enter. Something on the other side answers. The piece in the middle, the one that translates the human-readable name into a numeric address the network can route to, is the Domain Name System. It is one of the most important pieces of infrastructure ever built, and it is, simultaneously, one of the least understood.

This chapter is about how DNS actually works, and how it got to be this way.

Before there was DNS, there was a file.

I have been on the internet since the dial-up era. In those days, the first thing your modem did was connect to a Linux terminal at a telecom exchange. There was no GUI. No surfing. You typed commands into a shell and you read text back. The number of domain names in the world was small enough that no one needed a "system" for resolving them.

What we did have, and what every Unix machine still has, is a flat file: /etc/hosts. It is a list of hostname-to-IP mappings, one per line. Open it on your laptop right now. It will have at least one entry: 127.0.0.1 localhost.

In the early internet, that file was the entire global directory. Every machine that participated in the network maintained its own copy. When a new domain got created, the world exchanged updated files — overnight, by FTP, over slow lines. Downloading one megabyte took the better part of an hour.

This worked for a while. It does not scale. As the list of names grew, the file grew with it. By the late eighties it was obvious that this approach was about to collapse under its own weight. Something distributed had to replace it.

The first generation: a database with a query interface.

The first move was to take the flat file and put it behind a database. Instead of every machine downloading the whole world's hostnames, you would query a name server for the specific name you cared about. Less data on the wire. Faster updates. One server speaks for the world.

This solved the file-size problem and introduced two new ones.

The first was load. The DNS server that holds the world's mapping is now answering every lookup from every machine on the planet. A single server cannot do that. It will be bloated under the query rate and will become a single point of failure.

The second was authority. Who is allowed to own kumarpratik.com? Who is allowed to own .org or .net? Initially there was one organization in the middle, fielding registration requests from a growing internet. That organization became a bottleneck almost immediately.

The answer to both problems is the same answer most distributed systems eventually arrive at: a hierarchy of caches and authorities.

Today: a tree of caches that lazily ask upward.

The modern DNS is a tree.

At the top sit the root servers — a small number of authoritative machines (replicated globally) that know who is responsible for each top-level domain: .com, .org, .in, .dev, and so on. They do not know what IP google.com resolves to. They know which name servers to ask about anything ending in .com.

Below the root sit the TLD servers — one set per top-level domain. The .com TLD servers know which name servers are authoritative for every domain ending in .com. Again, they do not know the answer themselves; they know who to forward you to.

Below the TLD sit the authoritative name servers — the ones the domain owner actually configured. When I registered kumarpratik.com, my registrar wrote a record at the .com TLD server saying "the authoritative DNS for kumarpratik.com lives at these addresses." Inside my authoritative server, I configured the actual records: kumarpratik.com points to this IP, mail.kumarpratik.com is a CNAME for that, and so on.

And then, in front of all of that, sit caching resolvers. Your laptop is not configured to ask the root directly. Your DHCP-assigned resolver is. Usually it is your home router, your ISP's local cache, or a public one like 8.8.8.8 or 1.1.1.1. When you ask it for kumarpratik.com, it first checks if it already knows. If it does, you get the answer in a millisecond. If it doesn't, it walks the hierarchy on your behalf: root → TLD → authoritative → answer. Then it caches what it learned and serves the next person in milliseconds.

This cascade is one of the most important reasons the internet works at all. Most lookups never reach the authoritative server. They are answered by a cache somewhere upstream of the user, often in the same building. The authoritative servers see only the small fraction of queries that have not been seen recently.

Records — what actually lives in DNS.

When you log into your registrar's dashboard, you are looking at records. The ones you will meet most often:

A — maps a name to an IPv4 address. kumarpratik.com → 203.0.113.42.
AAAA — same as A but for IPv6.
CNAME — an alias. Instead of pointing the name to an IP, you point it to another name. www.kumarpratik.com → kumarpratik.com. The resolver then resolves the target. CNAMEs are cheap to update — change the IP of the target and every alias follows.
NS — name server. Says "the authoritative DNS for this domain lives at these hostnames." This is the record the TLD server consults when it points your resolver downstream.
TXT — arbitrary text. Used for SPF (anti-spam), DKIM (mail signing), and a hundred verification flows where some service wants you to prove you own the domain by putting a token here.
MX — mail exchange. Where to deliver mail for this domain.

A record carries data and a TTL. The TTL is the contract: "downstream caches are allowed to hold this for at most this many seconds before re-checking with me."

TTL — the contract between you and the caches.

When you set a TTL on a DNS record, you are negotiating with every cache between you and your users.

A short TTL — say, 60 seconds — means that if you change the IP, the world catches up within a minute. The cost is that caches keep coming back to your authoritative server every minute. Your origin sees more load.

A long TTL — say, 24 hours — means caches answer locally for a full day before checking back. Your origin sees almost no traffic. The cost is that when you change the IP, some fraction of users keep hitting the old address for up to a day.

This is why the rule of thumb is: lower the TTL before a planned change (so the cached value expires fast), then raise it again afterward.

Caches sometimes do not honor the TTL exactly. Some hold records longer than you asked them to, to save themselves traffic. Some refresh more aggressively. The TTL is a target, not a guarantee. Plan for the long tail.

The handoff from DHCP to DNS.

When your laptop joins a network, DHCP hands it more than just an IP address. It also hands it the address of the DNS resolver to use, the default gateway, the lease duration, and a few other things. Your operating system writes the resolver address into its network configuration, and every name lookup from then on goes there first.

This is also why, on a strange network, DNS poisoning is a viable attack. If the network you joined is hostile, the resolver it handed you is the resolver it controls. It can return any answer it wants. This is one of the reasons HTTPS exists — the certificate check happens after DNS resolution, and a wrong IP cannot fake a valid certificate. The cryptography defends what the directory cannot.

Two DHCP servers, briefly.

A question that comes up: what happens if there are two DHCP servers on the same network?

It depends. They may hand out conflicting addresses — chaos. They may hand out non-overlapping ranges — fine. They may compete for the same requests — race conditions and intermittent failures. In well-run office and data-center networks, you partition the address space carefully and configure failover behaviour. Concept: split scope — primary DHCP serves 80% of the range, secondary serves the other 20%, and either can pick up if the other is down.

The lesson, again, is the same one as in the MAC chapter: do not assume any invariant is absolutely true. Networks routinely run two DHCPs by design, and they routinely break when someone runs two by accident.

Push On It

Open /etc/hosts on your machine. Read every line. What is each entry doing? Which of them came with the OS, and which did you (or some installer) put there?
Run dig +trace example.com or dig +trace your-favourite-domain. Watch the resolution walk through root, TLD, and authoritative. Now run it again — notice how much faster the second one is, and explain to yourself which cache served you.
Pick a domain you own (or dig someone else's). List all the records on it. For each one, write down what would break if that record disappeared.
Lower the TTL on a record you control to 60 seconds, wait an hour, change the value, and time how long until every resolver you can find returns the new answer. Bring the data back.

DNS, the Quiet Backbone

What you will learn

DNS, the Quiet Backbone

Push On It

Watch a Name Resolve

Flashcards (6)