Your logs tell you what your code thinks it did. Your stack trace tells you where the code believed it was. strace tells you what the kernel actually did — every openat(), every read(), every connect() that returned -1 while your application swallowed the error and kept lying to you. When the gap between intent and reality is the bug, re-reading the source will not close it. You have to observe the boundary where userspace ends and the kernel begins.
Most engineers know strace -p <pid> as a panic button. Few actually read the output, and fewer reach for -f, -c, or -k when it counts. That gap is the difference between staring at a wall of syscalls and extracting a root cause in ninety seconds. Treating strace as a first-class instrument is a direct extension of hypothesis-driven debugging methodology: you form a precise claim about what the process is doing and let the evidence confirm or kill it.
What strace actually observes
strace is built on ptrace(2), the same kernel facility debuggers use. It attaches to a target and stops it on every transition across the syscall boundary — once on entry, once on exit — reading the registers to decode the syscall number, its arguments, and its return value. That is the whole model, and it explains the cost: every syscall now pays for two extra context switches into the tracer. On a syscall-heavy workload the target can slow by an order of magnitude. This is not a footnote — it is the reason strace belongs in diagnosis, never in a hot path or a latency-sensitive production service.
Reading a trace line — the part nobody teaches
Every line has the same shape: syscall(arguments) = return_value. The return value is where the truth lives. A successful call returns a file descriptor, a byte count, or 0; a failure returns -1 followed by the symbolic errno and its description. That errno is the signal you came for.
openat(AT_FDCWD, "/etc/app/config.yaml", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/app/config.yaml", O_RDONLY) = 3
read(3, "host: 127.0.0.1\nport: 6379\n", 4096) = 27
connect(4, {sa_family=AF_INET, sin_port=htons(6379), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ECONNREFUSED (Connection refused)Four lines, two findings, zero guesswork: the app probed a config path that does not exist before falling back to the real one, and the Redis connection was refused outright. No log statement told you either fact — the syscalls did. The failure worth memorising is EFAULT: it means a syscall received a pointer to memory the process does not own. That is the same class of bad-address access which, followed to its conclusion by the MMU and the kernel, produces a segmentation fault.
The flags that separate signal from noise
-f— follow forks and threads. The moment a process calls fork() or spawns a worker pool, an unadorned trace goes blind to the children. Almost every real service needs this flag.-e trace=…— filter by class instead of drowning.-e trace=network,-e trace=%file,-e trace=memorycut the stream to the syscalls you care about.-c— aggregate instead of stream. Produces a summary table of time, call count, and errors per syscall. This is your profiler the moment you suspect a process is syscall-bound.-T/-tt— time spent inside each syscall / absolute timestamps. Together they turn “it’s slow” into “it spends 600 ms infsync“.-y/-yy— decode descriptors, soread(3, …)becomesread(3</usr/share/app/config.yaml>, …)and sockets show their endpoints.-k— attach a userspace stack trace to each syscall, bridging “which syscall” to “which line of code”.-s N— raise the printed string length (the default truncates payloads).
The summary mode also answers “why is this slow?” without a dedicated profiler. Run any command under strace -f -c and the table ranks syscalls by cumulative time — the same technique that turns a sluggish terminal into a fixable list of offending stat() and openat() calls when you profile shell startup and .bashrc performance.
From raw trace to an actionable summary
Reading -c output by eye is fine once. When you want it in CI, in a regression gate, or aggregated across runs, parse it. The wrapper below runs a command under strace -f -c, captures the summary from stderr, parses it into typed records, and returns the syscalls that dominate time. It handles the obvious failure modes — strace missing, the target hanging, a malformed table — instead of assuming the happy path.
from __future__ import annotations
import shutil
import subprocess
from dataclasses import dataclass
from typing import Final
_HEADER_TOKENS: Final[tuple[str, ...]] = ("seconds", "calls", "syscall")
@dataclass(frozen=True, slots=True)
class SyscallStat:
name: str
seconds: float
calls: int
errors: int
@property
def usec_per_call(self) -> float:
return (self.seconds / self.calls) * 1_000_000 if self.calls else 0.0
class StraceError(RuntimeError):
"""Raised when strace cannot be run or its summary cannot be parsed."""
def profile_syscalls(cmd: list[str], *, timeout: float = 30.0) -> list[SyscallStat]:
"""Run *cmd* under ``strace -f -c`` and return per-syscall stats, busiest first.
Raises:
StraceError: strace is missing, the command times out, or the summary
table cannot be located in strace's stderr.
"""
if not cmd:
raise StraceError("empty command")
if shutil.which("strace") is None:
raise StraceError("strace is not installed or not on PATH")
try:
proc = subprocess.run(
["strace", "-f", "-c", "--", *cmd],
capture_output=True,
text=True,
timeout=timeout,
check=False, # the traced command may legitimately exit non-zero
)
except subprocess.TimeoutExpired as exc:
raise StraceError(f"traced command exceeded {timeout:.0f}s") from exc
return _parse_summary(proc.stderr)
def _parse_summary(stderr: str) -> list[SyscallStat]:
lines = stderr.splitlines()
header = next(
(i for i, ln in enumerate(lines) if all(t in ln for t in _HEADER_TOKENS)),
None,
)
if header is None:
raise StraceError("no summary table found; was -c passed to strace?")
stats: list[SyscallStat] = []
for line in lines[header + 1:]:
if not line.strip() or line.lstrip().startswith("-"):
continue
if "total" in line:
break # the total row terminates the table
cols = line.split()
if len(cols) < 5:
continue
try:
# cols: %time seconds usecs/call calls [errors] syscall
seconds = float(cols[1])
calls = int(cols[3])
errors = int(cols[4]) if len(cols) >= 6 and cols[4].isdigit() else 0
except (ValueError, IndexError):
continue # skip anything that is not a data row
stats.append(SyscallStat(name=cols[-1], seconds=seconds, calls=calls, errors=errors))
if not stats:
raise StraceError("summary table present but no rows could be parsed")
return sorted(stats, key=lambda s: s.seconds, reverse=True)
if __name__ == "__main__":
import sys
if len(sys.argv) < 2:
raise SystemExit("usage: profile.py COMMAND [ARGS...]")
for stat in profile_syscalls(sys.argv[1:]):
marker = " (has errors)" if stat.errors else ""
name = stat.name.ljust(18)
calls = str(stat.calls).rjust(8)
print(f"{name}{stat.seconds:9.4f}s {calls} calls {stat.usec_per_call:8.1f} us/call{marker}")
It is deliberately defensive: a missing binary, a timeout, and an unparseable table each raise a typed StraceError rather than returning silently wrong data. The check=False is intentional — the program you are tracing is allowed to exit non-zero; that is frequently the entire reason you are tracing it.
strace vs ltrace vs gdb vs perf — pick the right lens
strace is one lens, not the only one. Each tool observes a different layer at a different cost, and reaching for the wrong one wastes time or perturbs the very behaviour you are chasing.
| Tool | Observes | Overhead | Best question | Production-safe? |
|---|---|---|---|---|
| strace | Syscalls (kernel boundary) | High (2 ctx switches/call) | What is the process asking the kernel to do? | No — diagnosis only |
| ltrace | Library / function calls (userspace) | High | Which library calls fire, with what arguments? | No |
| gdb | Full program state, memory, breakpoints | Very high (stops the world) | What is the exact state at this line? | No (interactive) |
| perf trace | Syscalls via perf_events | Low | How do syscalls behave under real load? | Yes |
| bpftrace / eBPF | Arbitrary kernel + user probes | Very low | Custom, low-overhead production tracing | Yes |
When not to reach for strace
The overhead is not academic, and it is not the only constraint. On hardened systems, /proc/sys/kernel/yama/ptrace_scope can forbid attaching to a process you do not own — you will get EPERM no matter how root you feel. Inside containers, a restrictive seccomp profile may block ptrace entirely. And on a service handling live traffic, doubling the cost of every syscall is not a debugging session, it is an incident.
For anything that must run against production load, reach for perf trace or an eBPF tool such as bpftrace, which observe the same syscall boundary at a fraction of the cost. But on a box you control, with a process you can afford to slow down, strace remains the fastest way to answer the only question that matters when intent and reality diverge: what is this process actually doing? Once you can read its output fluently, that question stops being a mystery and becomes a lookup.
