What fork() in Linux Really Does — And Why It Matters More Than You Think

fork() in Linux is a system call every systems programmer knows. You call it, you get two processes. A simple model — and that’s exactly why it leads to bugs you don’t understand, because you don’t know what actually happened under the hood.


What the documentation says — and why it’s not enough

man fork will tell you that the call creates a new process that’s a copy of the parent. It returns 0 in the child process and the child’s PID in the parent. The classic example looks like this:

pid_t pid = fork();

if (pid == 0) {
    // we're in the child here
} else if (pid > 0) {
    // we're in the parent here
} else {
    // fork() failed
}

Anyone who’s taken an operating systems course knows this pattern. The problem is that the documentation describes the interface, not the implementation. And the implementation is where things get interesting.


fork() in Linux and Copy-on-Write: why fork is fast

If fork() actually copied the entire process — all memory pages, the stack, the heap, the data — it would be catastrophically slow. A web server handling a thousand requests per second, each spawning a new process, would grind to a halt.

Linux solves this with a mechanism called Copy-on-Write (CoW).

After fork() is called, the child and parent share the same physical memory pages. No data is copied. Both see identical contents, but the kernel marks these pages as read-only for both processes.

Copying happens only when one of the processes — parent or child — tries to write to a given page. The kernel catches this as a page fault, creates a copy of that specific page, assigns it to the process that wanted to write, and continues execution. The other process still sees the original.

In practice, this means fork() is cheap — O(1) in time, regardless of process size. You only pay for the memory pages you actually modify after the fork.


File descriptors and the subtlety you don’t see

Another thing fork() in Linux does automatically — and that most tutorials stay silent about — is inheriting file descriptors.

The parent and child inherit the same file descriptors — not copies, but literally the same entry in the kernel’s open files table. Both processes share the file position offset.

This has direct consequences:

If the parent has an open file and has written up to position 100, the child sees offset 100. If the child writes 50 bytes, the offset jumps to 150 — and when the parent writes next, it starts at 150, not 100.

A classic bug in code that does fork() without closing or duplicating descriptors: parent and child write to the same log file and overwrite each other’s data, because the offset is shared. No error appears. The logs are just randomly interleaved.

The correct approach is dup2() or closing the descriptors the child doesn’t need — immediately after fork(), before the child does anything else.


Signal handlers, mutexes, and problems you won’t expect in the child

After fork() the child inherits signal handlers defined by the parent. This is documented and often expected.

What’s less obvious: the child also inherits mutex state.

If at the moment fork() is called another thread of the parent holds a mutex — for example, during memory allocation via malloc() — the child starts with a locked mutex that no one will ever release. The child is single-threaded; the thread that held the mutex doesn’t exist in the child’s address space. Guaranteed deadlock.

This is one of the reasons POSIX defines pthread_atfork() — a mechanism for registering handlers called before and after fork(), which can clean up mutex state. In practice, it’s hard to use and rarely implemented correctly.

Practical takeaway: if your program is multi-threaded and uses fork(), you either know what you’re doing or you have a serious problem you haven’t discovered yet.


Why exec() always follows fork()

fork() itself is rarely the goal — it’s the means. The standard pattern is fork() + exec():

pid_t pid = fork();

if (pid == 0) {
    execv("/usr/bin/program", args);
    // if execv returns — something went wrong
    exit(1);
}

execv() replaces the child process image entirely with a new program. Everything fork() loaded — the entire address space of the parent — is discarded and replaced by a new executable.

Here’s where the cleverness of CoW reveals itself. Since the child immediately calls exec() and discards the entire inherited memory image — no page is actually copied. The CoW cost, already deferred until the moment of writing, is never incurred. fork() + exec() is cheap because CoW means the child never writes to inherited pages before discarding them.

This is the foundation the entire Unix process model stands on — and why spawning new processes in Linux is faster than you might think.


Zombies and orphans — two problems you see in ps aux

Two problems fork() in Linux generates if you don’t handle cleanup properly: zombies and orphans.

A zombie process appears when a child finishes but the parent never calls wait() to collect its exit status. A zombie consumes no CPU or memory — it only holds an entry in the kernel’s process table with the exit code. If the parent never calls wait(), the zombie stays until the parent dies.

At scale, zombies occupy entries in the kernel’s process table — which has a finite size. A server that spawns processes without proper wait() can eventually exhaust the limit and stop creating new processes.

In ps aux you recognize zombies by the letter Z in the STATE column and <defunct> next to the process name.

An orphan is the reverse situation — the parent terminates before the child. The child is adopted by init (PID 1), which calls wait() on its termination. Orphans are usually harmless — init cleans up after them automatically.


What this changes in practice

fork() in Linux shows up in places you don’t always expect: web servers spawning workers (Apache MPM prefork), shells executing commands, testing libraries isolating tests in separate processes, job queue systems.

Every time something unexpected happens with memory, files, or signals after a fork — you now know where to look. CoW explains why modifying a large object after fork() suddenly uses a lot of memory. Shared descriptors explain interleaved logs. Inherited mutexes explain deadlocks in a process that “should be clean.” The same hypothesis-and-deduction approach I described in the debugging article applies directly here.

The “fork copies the process” model isn’t false. It’s just not precise enough to solve problems.


How to investigate this yourself

You don’t have to take my word for it. To see fork() in Linux in action on your own system, use two tools that are always available.

# Observe process creation and system calls
strace -f -e trace=fork,clone,execve your_program

# Check the process memory map before and after fork()
cat /proc/PID/maps

strace -f follows system calls across all child processes — the -f flag means “follow forks.” Seeing raw system calls instead of C library abstractions, you stop guessing what the system is doing.



What happens to the address space and copy-on-write pages after a fork() ties directly into the fault mechanics described in the article on the anatomy of a segfault — from the MMU through the kernel to a gdb core dump. The process you’ve just created can then be isolated with native kernel primitives — the subject of the piece on cgroups v2 without Docker.

Piotr Karasiński
Piotr Karasiński — self-taught of software, GNU/Linux and systems architecture enthusiast. Writes about the layer between "it works" and "I understand why it works" at devmindset.dev.

Leave a Comment