BlogEngineering · Anti-pattern

Engineering · Anti-pattern

Why README-based Engineering Onboarding Always Rots

Every engineering README rots within two quarters. The pattern is structural, not editorial, and rewriting it harder is the wrong fix.

Written by

Elliot Bensabat

Co-founder, Capture

Published

21 April 2026

Pricing verified

May 2026

A 2,400-line scroll cracking down the middle next to a stack of twelve neat cards, brutalist editorial illustration suggesting the structural failure of monolithic engineering documentation

The numbers

Time-to-first-PR: 1 week
Week-1 Slack DMs: 1
"This should just work": 17 times
Unassisted setup: 90%

In 60 seconds

The short version.

A README is a single document that tries to be the canonical record of every step a new engineer needs to start working. The pattern fails in a specific way: it rots within two quarters, and the rot is not a writing problem. It is a structural problem with how a single document tracks a system that changes daily across Postgres versions, Node versions, SSL certs, VPN tunnels, environment variables, and the four scripts in `bin/` that nobody owns. Rewriting the README harder produces two clean weeks and then the same rot. The fix is to stop treating onboarding as a document at all.

01 · Section

The 17 places the README lies

A README does not lie deliberately. It lies because every claim it makes has a half-life shorter than the document's review cadence. The phrase "this should just work" appeared seventeen times in the 2,400-line README a staff engineer at a UK observability platform replaced last quarter. Each instance was a place the author had run out of patience for the edge case.

The catalogue below comes from the diff history of an actual onboarding doc at a Series B team of 65 engineers, comparable to a Snyk or Improbable mid-stage codebase.

	Lie type	Example phrasing	What was actually true
1	Version drift	"Install Node 18"	Node 20.6+ required, 18 fails on a TypeScript decorator
2	Tool replaced	"Run `make build`"	Makefile removed in Q2, replaced by a Bun script
3	Config silence	"Add the environment variables"	Four undocumented vars block the dev server
4	Stale screenshot	Old AWS console screenshot	Console redesigned, the button is in another menu
5	OS assumption	"Brew install Postgres"	Linux laptops need apt, not brew, no fallback path
6	Network assumption	"VPN should connect"	Corp firewall blocks the cert exchange on macOS Sonoma
7	Order error	Steps 4 and 5 swapped	Running them in the listed order corrupts the local DB
8	Permission gap	"You should have access"	New hires get access on day three, not day one
9	Branch drift	"Check out `main`"	The default branch was renamed to `trunk` six months ago
10	Secrets path	"Pull from the vault"	The vault path changed, the CLI command requires a flag
11	Half-step	"Run the migrations"	The migration script needs a flag the README skips
12	Shadow dependency	No mention of Redis	Redis must be running or the auth service crashes
13	Untrusted should	"This should just work"	It does not, the failure is silent and the logs are empty
14	Vendor change	"Stripe test keys"	Stripe sandbox replaced by a forked mock service
15	Bin script	"Run `bin/setup`"	The script has not been touched since 2023 and assumes Python 2
16	Deploy reference	"See deploy guide"	The deploy guide was deleted in a wiki cleanup
17	Browser session	"Login at localhost"	SSL cert is self-signed and Chrome blocks it without the manual override

Lie type

Version drift

Example phrasing

"Install Node 18"

What was actually true

Node 20.6+ required, 18 fails on a TypeScript decorator

Lie type

Tool replaced

Example phrasing

"Run make build"

What was actually true

Makefile removed in Q2, replaced by a Bun script

Lie type

Config silence

Example phrasing

"Add the environment variables"

What was actually true

Four undocumented vars block the dev server

Lie type

Stale screenshot

Example phrasing

Old AWS console screenshot

What was actually true

Console redesigned, the button is in another menu

Lie type

OS assumption

Example phrasing

"Brew install Postgres"

What was actually true

Linux laptops need apt, not brew, no fallback path

Lie type

Network assumption

Example phrasing

"VPN should connect"

What was actually true

Corp firewall blocks the cert exchange on macOS Sonoma

Lie type

Order error

Example phrasing

Steps 4 and 5 swapped

What was actually true

Running them in the listed order corrupts the local DB

Lie type

Permission gap

Example phrasing

"You should have access"

What was actually true

New hires get access on day three, not day one

Lie type

Branch drift

Example phrasing

"Check out main"

What was actually true

The default branch was renamed to trunk six months ago

Lie type

Secrets path

Example phrasing

"Pull from the vault"

What was actually true

The vault path changed, the CLI command requires a flag

Lie type

Half-step

Example phrasing

"Run the migrations"

What was actually true

The migration script needs a flag the README skips

Lie type

Shadow dependency

Example phrasing

No mention of Redis

What was actually true

Redis must be running or the auth service crashes

Lie type

Untrusted should

Example phrasing

"This should just work"

What was actually true

It does not, the failure is silent and the logs are empty

Lie type

Vendor change

Example phrasing

"Stripe test keys"

What was actually true

Stripe sandbox replaced by a forked mock service

Lie type

Bin script

Example phrasing

"Run bin/setup"

What was actually true

The script has not been touched since 2023 and assumes Python 2

Lie type

Deploy reference

Example phrasing

"See deploy guide"

What was actually true

The deploy guide was deleted in a wiki cleanup

Lie type

Browser session

Example phrasing

"Login at localhost"

What was actually true

SSL cert is self-signed and Chrome blocks it without the manual override

Each lie is small. The sum is what makes a new engineer's first week feel like archaeology. The engineering team documentation story at the same platform measured this: every new hire hit roughly six of these failures in week one.

02 · Section

Why rewriting the README never works

Three engineers had tried to rewrite the README in the year before Geoff, the staff engineer, recorded the setup. Each rewrite was excellent for two weeks. Then it rotted. The pattern is not about discipline or writing skill. It is structural and you can predict the failure mode without knowing the team.

A README has one author at a time. The first rewrite happens because the cost of the existing rot has become visible (a week of senior-engineer DMs per new hire). One person volunteers, blocks two days, produces something thorough. They ship it. The next week, a tool upgrades, a script changes, an env var gets added. The original author has moved on. The new change does not get backported because the cost of editing the file is higher than the cost of writing the answer in Slack to whoever asks. The rot starts in week three.

Compare this to the way code stays current. Code does not rot at the same rate because the build breaks when the code goes stale. The README is a read-only artefact from the build's perspective. Nothing fails when the README falls behind. NNGroup's research on why web users scan instead of reading describes the dominant pattern: readers scan, they do not read in full, and they trust the document less with every wrong claim they hit. By the time the new hire has hit three lies, they stop reading and DM the senior engineer. The senior engineer answers because answering is faster than rewriting line 1,847.

The structural conclusion is sharp. A README is the wrong format because the maintenance cost is borne by one person and the failure cost is borne by another, and that asymmetry is what makes rot inevitable. Rewriting does not change the asymmetry. It buys two weeks and resets the clock. The Monzo Engineering blog has written variations of this argument for years, and the conclusion holds for any team past fifteen engineers.

The path that works is to make maintenance the same person as the change: whoever modifies a tool re-records the affected step in two minutes, no doc sprint required. That is what the Chrome-extension capture flow is built around.

03 · Section

What replaces a README

What replaces a README is twelve short guides, one per failure mode, recorded by the engineer who solved that failure most recently. The 2,400-line README at the observability platform got replaced by twelve guides. The numbers map directly: time-to-first-PR fell from three weeks to one. Week-one Slack DMs to senior engineers dropped from six per new hire to one. Unassisted setup completion went from "almost never" to ninety percent. These are the metrics a Staff Engineer or Engineering Manager at a B2B SaaS, 50-200 engineers, can use to justify the change to leadership.

The structure is specific. One main guide, twenty-three steps, walks the happy path on a fresh laptop. Each known failure mode (the wrong Node version, the SSL cert, the missing env vars, the VPN tunnel, the Postgres extension, the four bin/ scripts) has its own short troubleshooting guide. The main guide links to the troubleshooting guide at the exact step where that failure typically surfaces. The reader does not scan a 2,400-line file looking for their error. They click through.

The shape of each guide matters. Steps are numbered. Each has a screenshot of the actual current screen, not a year-old approximation. Each has the exact command, not a paraphrased version. The narration explains why the step exists, not just what it does, because new engineers stop trusting docs that read as command transcripts. NNGroup's F-shaped pattern reading research is why this works: readers scan the first few words of each line, hit the screenshot, then move on. Step granularity matches the scan pattern.

When a tool changes, only the affected step gets re-recorded. Two minutes, not a README rewrite. Andrew, the platform engineer who maintains the GoCardless webhook handler, films the upgraded step on a Tuesday afternoon and the next joiner inherits the corrected guide. The maintenance cost stays low enough that the engineer who made the change actually does the update. The failure cost stops compounding because the next hire never hits the stale step. The full walkthrough lives in the case for step-by-step guides.

04 · Section

The cost of keeping the README

The cost of keeping the README is not paid by the author. It is paid by the seniors who answer DMs and by the new hires whose first PR slips by two weeks. That is what makes the cost invisible to the engineer who could rewrite the doc. The accounting is wrong, and bad accounting protects bad patterns.

Run the maths at sixty-five engineers, four new hires per quarter, three weeks of ramp. Each new hire generates roughly six week-one DMs to senior engineers. Each DM takes about twenty minutes once context-switching is counted. That is two hours of senior-engineer time per new hire on the obvious failures alone, before the actual setup help. Four new hires per quarter at two hours each is eight hours per quarter on questions a guide could have answered.

The per-new-hire cost is sharper. Two weeks of stalled ramp means zero PRs shipped in their first fortnight. At a fully-loaded cost of roughly £158,000 per senior engineer per year (the conservative number for a Series B London observability platform, around $200,000 USD at current rates), two weeks of zero output is roughly £6,100 per new hire that the company eats because the README rotted. Multiply by four new hires per quarter and the annual figure is just under £100,000, around $123,000 USD.

The cost compounds because it is not on any team's quarterly OKRs. The rot tax is paid by the engineering organisation in aggregate and shows up in nobody's review. The fix is to put the cost on the same person who can prevent it: whoever changes the tool, fixes the step, in the same five minutes. That is what the team plan starts at $12 USD per seat covers, and the cost calculation almost always justifies it within the first quarter.

There is a softer cost too. New hires who hit too many lies in their first week start their tenure with the wrong calibration. Colin, joining from a Darktrace team where docs were maintained, learns that documentation is unreliable, that the way to get answers is to DM seniors, that processes rot. That calibration is hard to reverse when leadership later wants those same engineers to write good docs themselves.

05 · Section

When a README is still the right format

A README is still the right format for things that change rarely and are read in full. Three cases qualify, and the rest belong in recorded guides.

The first case is the project's purpose statement. What the system does, what it does not do, who owns it. This information changes once a year at most. A reader scans it once and moves on. A README sentence is the right shape for it.

The second case is the contributing guide for an open-source project. Code style, branch naming, PR conventions, the contributor licence agreement. These rules apply to one-time contributors who will never be in the team's Slack. The reader does not have access to a Capture workspace, does not have a senior to DM, and benefits from a self-contained text document they can read on GitHub. The README is the right vehicle here.

The third case is the architecture overview, the top-level diagram and the description of which service talks to which. This document changes when the architecture changes, roughly once or twice a year on a stable system. It is read for understanding, not for execution, and the read-once pattern matches the README format.

Everything that changes more than once a quarter and gets read for execution is the wrong fit. Dev environment setup, deploy flow, on-call runbook, debugging patterns, the four shadow dependencies that block the auth service. All of those rot at the speed the underlying tools change, and all benefit from per-step recordings that cost two minutes to refresh.

A second reference for this split lives in how to document customer onboarding workflow, which makes the same argument in a different domain. Stephen, who runs platform at a UK fintech with FCA and ICO obligations on top of the usual delivery pressure, splits the same way: static reference content stays in the repo README, executable workflows live as recorded guides. NNGroup's work on legibility, readability, and comprehension supports the split: docs read for execution and docs read for understanding need different shapes. The structural problem is the same: a single document cannot track a process that changes faster than it gets edited. The fix scales across both domains.

I read line 1,847 and realised I was the third person to update this paragraph in eighteen months. The build system it described had been replaced twice.

Senior engineer, UK observability platform

FAQ

Frequently asked questions.

How long does it take to replace a 2,400-line README with recorded guides?: A staff engineer at a UK observability platform recorded the main twenty-three-step setup in roughly four hours, including the narration. Each of the six failure-mode troubleshooting guides took about thirty minutes. Total elapsed time was under two days for one engineer, faster than any of the three previous README rewrites the team had attempted. Full numbers in the engineering team documentation story.
Who maintains the guides once the original author moves on?: Whoever changes the tool re-records the affected step. The maintenance cost is two minutes per change, which is low enough that engineers actually do it instead of leaving it for someone else. The pattern works because the cost of the update is paid by the same person who triggered the need for the update, which removes the asymmetry that makes a README rot. Quarterly reviews catch any step that someone forgot to re-record, but the per-step cost is low enough that quarterly is mostly a safety net.
What about engineers who prefer reading text to watching guides?: Capture guides are not videos. Each step is a screenshot plus a written description, exported as Markdown, HTML, or PDF. A reader who wants to skim does so the same way they would skim a README, except the screenshots are accurate. The narration is optional and renders as text aligned to each step. The format optimises for scan-readers, which NNGroup research on legibility, readability, and comprehension shows is how technical docs are actually consumed.
Does this work for open-source projects with external contributors?: Partially. The architecture overview, the contributing guide, and the project purpose belong in the README because external contributors do not have access to a private workspace. The dev environment setup can be exported as PDF or HTML and committed to the repo, which gives external contributors the same scan-friendly format without forcing maintainers to write a 2,400-line file. The split is static content in README, executable workflows in recorded guides.
What is the right team size to switch to this pattern?: The pattern starts paying off at roughly fifteen engineers, when the senior-engineer-DM tax becomes visible enough to justify a change. Below that, the README is usually maintained by the same person who reads it. Above fifty engineers, the cost of senior-engineer interruption is too high for the existing format. The Staff Engineer or Engineering Manager at a B2B SaaS in the 50-200 range is the persona that gets the largest payoff, the same shape as a Snyk, Improbable, or Darktrace platform team mid-growth.

Take the next step

Ready to replace your engineering README with twelve recorded guides?

Capture records the setup once on a fresh laptop, narrates each step, and exports a per-step guide that updates in two minutes when a tool changes. Time-to-first-PR drops from three weeks to one in the case study above.

Install the Chrome extension

Keep reading

Workflow documentation · Playbook

The Case for Step-by-Step Guides: Six Teams, One Pattern

The senior person who knows a workflow cold becomes a bottleneck. The wiki rots. The Loom nobody watches accumulates dust. Step-by-step guides break that pattern across the six teams we have watched do this in production.

Workflow documentation · How-to

How to Document a Customer Onboarding Workflow in 2026

Most onboarding documentation goes stale in eight weeks because nobody re-records it when the UI ships an update. The fix is not better writers. It is a recording-first method that takes ten minutes per refresh.

Try it

Record one workflow.

Free Chrome extension. No signup required.

Add to Chrome More articles