Why README-based Engineering Onboarding Always Rots
Every engineering README rots within two quarters. The pattern is structural, not editorial, and rewriting it harder is the wrong fix.


- Time-to-first-PR
- 1 week
- Week-1 Slack DMs
- 1
- "This should just work"
- 17 times
- Unassisted setup
- 90%
The short version.
A README is a single document that tries to be the canonical record of every step a new engineer needs to start working. The pattern fails in a specific way: it rots within two quarters, and the rot is not a writing problem. It is a structural problem with how a single document tracks a system that changes daily across Postgres versions, Node versions, SSL certs, VPN tunnels, environment variables, and the four scripts in `bin/` that nobody owns. Rewriting the README harder produces two clean weeks and then the same rot. The fix is to stop treating onboarding as a document at all.
The 17 places the README lies
A README does not lie deliberately. It lies because every claim it makes has a half-life shorter than the document's review cadence. The phrase "this should just work" appeared seventeen times in the 2,400-line README a staff engineer at a UK observability platform replaced last quarter. Each instance was a place the author had run out of patience for the edge case.
The catalogue below comes from the diff history of an actual onboarding doc at a Series B team of 65 engineers, comparable to a Snyk or Improbable mid-stage codebase.
| Lie type | Example phrasing | What was actually true | |
|---|---|---|---|
| 1 | Version drift | "Install Node 18" | Node 20.6+ required, 18 fails on a TypeScript decorator |
| 2 | Tool replaced | "Run make build" | Makefile removed in Q2, replaced by a Bun script |
| 3 | Config silence | "Add the environment variables" | Four undocumented vars block the dev server |
| 4 | Stale screenshot | Old AWS console screenshot | Console redesigned, the button is in another menu |
| 5 | OS assumption | "Brew install Postgres" | Linux laptops need apt, not brew, no fallback path |
| 6 | Network assumption | "VPN should connect" | Corp firewall blocks the cert exchange on macOS Sonoma |
| 7 | Order error | Steps 4 and 5 swapped | Running them in the listed order corrupts the local DB |
| 8 | Permission gap | "You should have access" | New hires get access on day three, not day one |
| 9 | Branch drift | "Check out main" | The default branch was renamed to trunk six months ago |
| 10 | Secrets path | "Pull from the vault" | The vault path changed, the CLI command requires a flag |
| 11 | Half-step | "Run the migrations" | The migration script needs a flag the README skips |
| 12 | Shadow dependency | No mention of Redis | Redis must be running or the auth service crashes |
| 13 | Untrusted should | "This should just work" | It does not, the failure is silent and the logs are empty |
| 14 | Vendor change | "Stripe test keys" | Stripe sandbox replaced by a forked mock service |
| 15 | Bin script | "Run bin/setup" | The script has not been touched since 2023 and assumes Python 2 |
| 16 | Deploy reference | "See deploy guide" | The deploy guide was deleted in a wiki cleanup |
| 17 | Browser session | "Login at localhost" | SSL cert is self-signed and Chrome blocks it without the manual override |
make build"main"trunk six months agobin/setup"Each lie is small. The sum is what makes a new engineer's first week feel like archaeology. The engineering team documentation story at the same platform measured this: every new hire hit roughly six of these failures in week one.
Why rewriting the README never works
Three engineers had tried to rewrite the README in the year before Geoff, the staff engineer, recorded the setup. Each rewrite was excellent for two weeks. Then it rotted. The pattern is not about discipline or writing skill. It is structural and you can predict the failure mode without knowing the team.
A README has one author at a time. The first rewrite happens because the cost of the existing rot has become visible (a week of senior-engineer DMs per new hire). One person volunteers, blocks two days, produces something thorough. They ship it. The next week, a tool upgrades, a script changes, an env var gets added. The original author has moved on. The new change does not get backported because the cost of editing the file is higher than the cost of writing the answer in Slack to whoever asks. The rot starts in week three.
Compare this to the way code stays current. Code does not rot at the same rate because the build breaks when the code goes stale. The README is a read-only artefact from the build's perspective. Nothing fails when the README falls behind. NNGroup's research on why web users scan instead of reading describes the dominant pattern: readers scan, they do not read in full, and they trust the document less with every wrong claim they hit. By the time the new hire has hit three lies, they stop reading and DM the senior engineer. The senior engineer answers because answering is faster than rewriting line 1,847.
The structural conclusion is sharp. A README is the wrong format because the maintenance cost is borne by one person and the failure cost is borne by another, and that asymmetry is what makes rot inevitable. Rewriting does not change the asymmetry. It buys two weeks and resets the clock. The Monzo Engineering blog has written variations of this argument for years, and the conclusion holds for any team past fifteen engineers.
The path that works is to make maintenance the same person as the change: whoever modifies a tool re-records the affected step in two minutes, no doc sprint required. That is what the Chrome-extension capture flow is built around.
What replaces a README
What replaces a README is twelve short guides, one per failure mode, recorded by the engineer who solved that failure most recently. The 2,400-line README at the observability platform got replaced by twelve guides. The numbers map directly: time-to-first-PR fell from three weeks to one. Week-one Slack DMs to senior engineers dropped from six per new hire to one. Unassisted setup completion went from "almost never" to ninety percent. These are the metrics a Staff Engineer or Engineering Manager at a B2B SaaS, 50-200 engineers, can use to justify the change to leadership.
The structure is specific. One main guide, twenty-three steps, walks the happy path on a fresh laptop. Each known failure mode (the wrong Node version, the SSL cert, the missing env vars, the VPN tunnel, the Postgres extension, the four bin/ scripts) has its own short troubleshooting guide. The main guide links to the troubleshooting guide at the exact step where that failure typically surfaces. The reader does not scan a 2,400-line file looking for their error. They click through.
The shape of each guide matters. Steps are numbered. Each has a screenshot of the actual current screen, not a year-old approximation. Each has the exact command, not a paraphrased version. The narration explains why the step exists, not just what it does, because new engineers stop trusting docs that read as command transcripts. NNGroup's F-shaped pattern reading research is why this works: readers scan the first few words of each line, hit the screenshot, then move on. Step granularity matches the scan pattern.
When a tool changes, only the affected step gets re-recorded. Two minutes, not a README rewrite. Andrew, the platform engineer who maintains the GoCardless webhook handler, films the upgraded step on a Tuesday afternoon and the next joiner inherits the corrected guide. The maintenance cost stays low enough that the engineer who made the change actually does the update. The failure cost stops compounding because the next hire never hits the stale step. The full walkthrough lives in the case for step-by-step guides.
The cost of keeping the README
The cost of keeping the README is not paid by the author. It is paid by the seniors who answer DMs and by the new hires whose first PR slips by two weeks. That is what makes the cost invisible to the engineer who could rewrite the doc. The accounting is wrong, and bad accounting protects bad patterns.
Run the maths at sixty-five engineers, four new hires per quarter, three weeks of ramp. Each new hire generates roughly six week-one DMs to senior engineers. Each DM takes about twenty minutes once context-switching is counted. That is two hours of senior-engineer time per new hire on the obvious failures alone, before the actual setup help. Four new hires per quarter at two hours each is eight hours per quarter on questions a guide could have answered.
The per-new-hire cost is sharper. Two weeks of stalled ramp means zero PRs shipped in their first fortnight. At a fully-loaded cost of roughly £158,000 per senior engineer per year (the conservative number for a Series B London observability platform, around $200,000 USD at current rates), two weeks of zero output is roughly £6,100 per new hire that the company eats because the README rotted. Multiply by four new hires per quarter and the annual figure is just under £100,000, around $123,000 USD.
The cost compounds because it is not on any team's quarterly OKRs. The rot tax is paid by the engineering organisation in aggregate and shows up in nobody's review. The fix is to put the cost on the same person who can prevent it: whoever changes the tool, fixes the step, in the same five minutes. That is what the team plan starts at $12 USD per seat covers, and the cost calculation almost always justifies it within the first quarter.
There is a softer cost too. New hires who hit too many lies in their first week start their tenure with the wrong calibration. Colin, joining from a Darktrace team where docs were maintained, learns that documentation is unreliable, that the way to get answers is to DM seniors, that processes rot. That calibration is hard to reverse when leadership later wants those same engineers to write good docs themselves.
When a README is still the right format
A README is still the right format for things that change rarely and are read in full. Three cases qualify, and the rest belong in recorded guides.
The first case is the project's purpose statement. What the system does, what it does not do, who owns it. This information changes once a year at most. A reader scans it once and moves on. A README sentence is the right shape for it.
The second case is the contributing guide for an open-source project. Code style, branch naming, PR conventions, the contributor licence agreement. These rules apply to one-time contributors who will never be in the team's Slack. The reader does not have access to a Capture workspace, does not have a senior to DM, and benefits from a self-contained text document they can read on GitHub. The README is the right vehicle here.
The third case is the architecture overview, the top-level diagram and the description of which service talks to which. This document changes when the architecture changes, roughly once or twice a year on a stable system. It is read for understanding, not for execution, and the read-once pattern matches the README format.
Everything that changes more than once a quarter and gets read for execution is the wrong fit. Dev environment setup, deploy flow, on-call runbook, debugging patterns, the four shadow dependencies that block the auth service. All of those rot at the speed the underlying tools change, and all benefit from per-step recordings that cost two minutes to refresh.
A second reference for this split lives in how to document customer onboarding workflow, which makes the same argument in a different domain. Stephen, who runs platform at a UK fintech with FCA and ICO obligations on top of the usual delivery pressure, splits the same way: static reference content stays in the repo README, executable workflows live as recorded guides. NNGroup's work on legibility, readability, and comprehension supports the split: docs read for execution and docs read for understanding need different shapes. The structural problem is the same: a single document cannot track a process that changes faster than it gets edited. The fix scales across both domains.
Frequently asked questions.
- How long does it take to replace a 2,400-line README with recorded guides?
A staff engineer at a UK observability platform recorded the main twenty-three-step setup in roughly four hours, including the narration. Each of the six failure-mode troubleshooting guides took about thirty minutes. Total elapsed time was under two days for one engineer, faster than any of the three previous README rewrites the team had attempted. Full numbers in the engineering team documentation story.
- Who maintains the guides once the original author moves on?
Whoever changes the tool re-records the affected step. The maintenance cost is two minutes per change, which is low enough that engineers actually do it instead of leaving it for someone else. The pattern works because the cost of the update is paid by the same person who triggered the need for the update, which removes the asymmetry that makes a README rot. Quarterly reviews catch any step that someone forgot to re-record, but the per-step cost is low enough that quarterly is mostly a safety net.
- What about engineers who prefer reading text to watching guides?
Capture guides are not videos. Each step is a screenshot plus a written description, exported as Markdown, HTML, or PDF. A reader who wants to skim does so the same way they would skim a README, except the screenshots are accurate. The narration is optional and renders as text aligned to each step. The format optimises for scan-readers, which NNGroup research on legibility, readability, and comprehension shows is how technical docs are actually consumed.
- Does this work for open-source projects with external contributors?
Partially. The architecture overview, the contributing guide, and the project purpose belong in the README because external contributors do not have access to a private workspace. The dev environment setup can be exported as PDF or HTML and committed to the repo, which gives external contributors the same scan-friendly format without forcing maintainers to write a 2,400-line file. The split is static content in README, executable workflows in recorded guides.
- What is the right team size to switch to this pattern?
The pattern starts paying off at roughly fifteen engineers, when the senior-engineer-DM tax becomes visible enough to justify a change. Below that, the README is usually maintained by the same person who reads it. Above fifty engineers, the cost of senior-engineer interruption is too high for the existing format. The Staff Engineer or Engineering Manager at a B2B SaaS in the 50-200 range is the persona that gets the largest payoff, the same shape as a Snyk, Improbable, or Darktrace platform team mid-growth.
Ready to replace your engineering README with twelve recorded guides?
Capture records the setup once on a fresh laptop, narrates each step, and exports a per-step guide that updates in two minutes when a tool changes. Time-to-first-PR drops from three weeks to one in the case study above.
The Case for Step-by-Step Guides: Six Teams, One Pattern
The senior person who knows a workflow cold becomes a bottleneck. The wiki rots. The Loom nobody watches accumulates dust. Step-by-step guides break that pattern across the six teams we have watched do this in production.
How to Document a Customer Onboarding Workflow in 2026
Most onboarding documentation goes stale in eight weeks because nobody re-records it when the UI ships an update. The fix is not better writers. It is a recording-first method that takes ten minutes per refresh.
Record one workflow.
Free Chrome extension. No signup required.