Part 420 min read

The Linux Way

Why "everything is a file" is one of the most consequential ideas in computing, and how a handful of simple APIs let Linux run everything from a toaster to a supercomputer.

What you will learn

  • Explain what "everything is a file" actually means and what it buys you
  • Read and write file permissions in both symbolic and octal form
  • Recognise the standard Linux filesystem layout (`/etc`, `/var`, `/usr`, `/tmp`, ...)
  • Understand what init systems and package managers do, and how Linux distributions differ

The Linux Way

There are many operating systems in the world. There is, depending on how you count, exactly one that runs the majority of the internet, the majority of the world's phones, the majority of supercomputers, the majority of cloud servers, and a remarkable number of toasters. That OS is Linux, and the reason it has won so much of computing has very little to do with its desktop UI and a great deal to do with a small set of design choices that have aged extraordinarily well.

This chapter is about those choices.

Everything is a file.

This is the most famous line in Unix philosophy, and one of the most consequential ideas in computer science. It deserves to be quoted carefully and understood literally.

In Linux, the way you interact with anything — a file on disk, a directory, the keyboard, the screen, a USB device, a process's memory, a network socket, a kernel parameter — is through the same handful of system calls: open, read, write, close. The thing you are interacting with might be a 4 KB text file on an SSD. It might be a hundred gigabytes of RAM in a process. It might be your CPU's current temperature, exposed by the kernel as a file in /sys/class/thermal/. It might be a connection to a remote server, halfway around the world.

The application code does not know the difference. It just opens a file descriptor and reads.

This is the magic. By unifying all I/O behind the file abstraction, Linux made a vast number of disparate things composable. You can pipe the output of one program into the input of another. You can redirect any program's output to a file or a network socket. You can write a 10-line shell script that combines disk I/O, process management, and remote communication, because every one of them is, at the syscall level, just bytes flowing through file descriptors.

Take a moment with this. It is not a metaphor. It is the actual API.

The VFS — many filesystems, one interface.

Sitting just below the "everything is a file" abstraction is another quiet piece of brilliance: the Virtual File System (VFS).

A filesystem is the structure that organises bytes on a storage device — ext4, XFS, btrfs, NTFS, FAT32, ZFS, and dozens of others. Each one stores data differently on disk. Each one has its own performance characteristics, recovery strategies, and metadata.

Without VFS, every application that wanted to read a file would need to know which filesystem the file was on, and how to parse that filesystem's on-disk format. That would be a disaster.

With VFS, the kernel presents one interface — the file syscalls — and routes each request to the appropriate underlying filesystem driver. Your application opens /home/sanket/foo.txt. The VFS sees that /home is mounted as ext4, dispatches to the ext4 driver, fetches the bytes, and hands them back. The application has no idea ext4 was involved. Tomorrow you mount that home directory over NFS — the application still works without a recompile, because the VFS now dispatches to the NFS driver instead.

The same VFS layer is what lets /proc look like a filesystem even though it is not stored on disk at all. When you cat /proc/cpuinfo, you are not reading a file. You are asking the kernel a question, and the kernel is composing an answer in real time and handing it back through the file API. The same is true of /sys, which exposes nearly every aspect of kernel state as files you can read and (sometimes) write.

This is why Linux feels so coherent. Once you understand the file API, you can introspect almost any subsystem in the kernel just by reading a file.

File descriptors — the unified handle.

When you open a file, the kernel returns an integer: the file descriptor. It is small (often a number like 3, 4, 5) and it is your handle to that file from then on. Every read and write references it.

A process can have many file descriptors open at once. By convention, file descriptor 0 is standard input, 1 is standard output, and 2 is standard error. Everything you open after that gets the next available integer.

Network sockets are file descriptors. Pipes are file descriptors. Open directories are file descriptors. The kernel does not care what's behind them. From its point of view, you have a handle, and you are doing I/O on it.

This is also why, when a senior engineer says "I'm out of file descriptors," they are saying something specific: the process has hit the limit on simultaneously open files (including sockets). You can raise the limit (ulimit -n), but you should also figure out why the process is leaking them — usually because something is opening connections without closing them.

The filesystem layout — where things actually live.

A standard Linux system organises its files in a specific tree. You should know the major branches.

  • / — the root. Everything lives somewhere below it.
  • /bin — essential binary executables, available even in single-user mode.
  • /sbin — system binaries, generally for the root user.
  • /etc — system-wide configuration files. The config file for nearly every daemon lives here.
  • /home — user home directories. Your personal files, your shell history, your dotfiles.
  • /usr — installed software that came with the distribution.
  • /usr/local — software you installed yourself, separate from the package manager.
  • /var — variable data: logs (/var/log), spool files, runtime state.
  • /tmp — temporary files. Cleared on reboot. Anyone can write here.
  • /proc — virtual filesystem exposing process information and kernel state.
  • /sys — virtual filesystem exposing device and driver state.
  • /dev — device files. Your disks, your terminals, your random number generator.

Once you have these in your head, you can find almost anything on a Linux system within seconds. If a daemon is misbehaving, you check /etc for its config, /var/log for its logs, /proc or systemctl for its state, and /dev for whatever device it claims is broken.

Permissions, in three groups and three actions.

Every file in Linux belongs to a user and a group. Every file has three permission sets — for the owner, for the group, for everyone else — and each set has three flags: read, write, execute.

You write them as rwxr-xr-x (owner: rwx, group: r-x, other: r-x) or as octal 755 (4+2+1, 4+0+1, 4+0+1). Most engineers memorise the octals because they are quicker:

  • 7 (rwx) — full access.
  • 6 (rw-) — read and write, no execute.
  • 5 (r-x) — read and execute, common for scripts and binaries that everyone runs.
  • 4 (r--) — read only.
  • 0 (---) — nothing.

So chmod 644 some-file means owner can read and write, everyone else can only read. chmod 700 my-script.sh means only the owner can do anything with it. chown alice:devs my-file changes the owning user to alice and the owning group to devs.

That entire permission system fits in fewer than fifty words, and it has been load-bearing for decades. When SELinux and AppArmor and capability systems came along, they augmented it; none of them have replaced it. The simplicity is the point.

Init systems — how userspace starts.

After the kernel finishes booting, it needs to start the first user-space process. That process is responsible for starting every other service on the system.

For thirty years, that program was init, and it ran shell scripts from /etc/init.d in a fixed order based on numbered "runlevels" (0 = halt, 1 = single user, 3 = multi-user with networking, 5 = graphical, 6 = reboot). It was simple, but it was sequential — if one service was slow to start, every later service waited for it.

Modern Linux distributions use systemd. It is one of the most controversial pieces of software in the ecosystem, but technically it has clear wins: services are declared as unit files with explicit dependency relationships, services start in parallel where their dependencies allow, and systemd supervises them — restarting on crash, journaling their logs, gating their resource usage. You interact with it through systemctl start nginx, systemctl status sshd, journalctl -u my-service.

A few distributions hold out — Alpine uses OpenRC, some embedded systems still use raw init scripts. But on the servers you will encounter at work, systemd is the default.

Package managers — installing software without ruining the system.

Linux distributions are, mostly, the same kernel plus the same userland tools plus different packaging choices. A package manager is the program that installs software, tracks what's installed, and resolves dependencies.

  • APT (Debian, Ubuntu) — uses .deb packages. apt install nginx. apt update, apt upgrade.
  • RPM family (Fedora, RHEL, SUSE) — uses .rpm packages, managed by dnf or yum or zypper. dnf install nginx.
  • Pacman (Arch) — uses .pkg.tar.zst packages. pacman -S nginx.
  • APK (Alpine) — minimal, fast, popular in containers. apk add nginx.

When you ask a package manager to install something, it checks the package's declared dependencies, finds (and installs) any missing ones, downloads everything, verifies cryptographic signatures, runs install scripts, and updates its database. If there is a version conflict — package A needs version 1 of a library, package B needs version 2 — the package manager either resolves it (sometimes both versions coexist) or refuses to proceed.

This is why developers sometimes ship their applications with all dependencies bundled inside the package itself. The application becomes self-contained, immune to the chaos of the host system's library versions. Docker, which we will meet two chapters from now, takes this idea to its logical conclusion: ship the entire userland.

Why Linux distributions differ.

It is sometimes surprising how much variation there is across "Linux." The kernel is the same. The userland is largely the same. So what is different?

Three things, mostly.

The init system (systemd vs OpenRC vs runit), which determines how services start.

The package manager and format (APT/dpkg vs RPM/dnf vs APK), which determines how software is distributed and installed.

The userspace conventions — which directories things live in, which utilities are preinstalled, what the default shell is, what the default text editor is, how aggressively they patch upstream packages.

You will hear engineers argue about which distribution is "best." The honest answer is that for any given task, two or three of them are equally good, and the choice is mostly about familiarity and the ecosystem around the server. Ubuntu dominates cloud images. Alpine dominates containers. RHEL/CentOS/Rocky dominate enterprise. Pick the one your team already uses, learn its idioms, and stop arguing.

sysctl — the kernel's tuning knobs.

One last thing before we close out. The kernel exposes hundreds of tunable parameters through sysctl. These are knobs that change how the kernel behaves at runtime: how aggressively it swaps, how many file descriptors a process can open, how much memory it reserves for network buffers, whether it forwards IP packets, and on and on.

You read them with sysctl -a (or by reading files under /proc/sys/). You write them with sysctl -w net.ipv4.ip_forward=1 (or by writing to the appropriate /proc/sys/ file). You persist them in /etc/sysctl.conf or /etc/sysctl.d/.

There are hard limits and soft limits on resources (ulimit -Hn, ulimit -Sn). There are per-user limits in /etc/security/limits.conf. When an application "hits a limit" — too many open files, too many threads, too much locked memory — the answer is often somewhere in this layer. The OS is being a bodyguard. You either persuade the bodyguard with a higher limit, or you figure out why your application needed so much in the first place.

The Linux way, in summary: a small set of uniform abstractions, exposed through a single file API, layered carefully so that tinkering at one layer rarely breaks another. It is not the most beautiful operating system anyone has designed, but it is the most thoroughly engineered one, and that is why it runs the world.

Push On It

  1. On any Linux system, run cat /proc/cpuinfo, cat /proc/meminfo, cat /proc/interrupts, and cat /sys/class/net/eth0/statistics/rx_bytes (substituting your real interface). You are reading the kernel as a filesystem. Read one entry until you can explain what it is telling you.
  2. Pick a service on your machine — nginx, sshd, postgresql, anything. Find its systemd unit file (systemctl cat <name>). Read every line. Now break the service deliberately by editing the unit file, watch what happens, and restore it.
  3. Find a file on your machine you do not have access to. Use chmod and chown (with sudo) to give yourself, and then revoke, access. Verify each step by trying to read the file from a different user.
  4. Pick a sysctl parameter you have never touched. Read its meaning. Decide whether the default is right for your workload. If it is, leave it alone. If it isn't, change it and measure.

Live in the Shell

Open a terminal. Without leaving it for an hour: navigate the filesystem, change a file's permissions, create a new user, switch to it, edit a config file in `/etc`, restart a service via `systemctl`, and read a log in `/var/log`. The point is not to memorise commands. The point is to feel that the whole system is reachable from one prompt.

Flashcards (6)

What does "everything is a file" mean in Linux?

What is the VFS?

What does `chmod 755 file.sh` mean?

+3 more flashcards

The Linux Way | Junior2Senior.dev | Junior2Senior.dev