StoriesEngineering · Internal documentation
Engineering · Internal documentation

Replacing a 2,400-line README with twelve guides.

A staff engineer at a B2B observability platform recorded the dev environment setup once. Time-to-first-PR for new hires dropped from three weeks to one.

Staff Engineer, B2B observability platform, 65 engineers, Series B
Staff engineer at a fresh-laptop setup with dual screens, brutalist editorial illustration
Time-to-first-PR
1 week
3 weeks
For new engineers
Week-1 Slack DMs
1
6
Per new hire
Guide library
12
Dev env, deploy, on-call
Unassisted setup
90%
New hires finishing alone

01

The onboarding-engineer doc was a 2,400-line README. It was thorough and untrustworthy. Some sections referenced a build system that had been replaced. Some assumed an OS version that no laptop in the company still ran. The phrase 'this should just work' appeared seventeen times.

Every new engineer hit the same failures: missing Postgres extension, the wrong Node version, the SSL cert that needed regenerating, the four environment variables nobody had documented. Each failure became a Slack DM to a senior engineer. By month two of a hiring wave, senior engineers were spending half their week on onboarding.

The fix everyone proposed was 'rewrite the README.' Three engineers had tried in the previous year. Each rewrite was great for two weeks, then went stale. The README rotted because nobody owned it. The cost was distributed across every senior engineer who fielded the same DMs.

Every engineer onboarding hit the same six failure modes. I recorded each one being solved, and the failures stopped feeling personal.
Staff Engineer
B2B observability platform, 65 engineers, Series B

02

The setup got recorded from scratch on a fresh laptop, with Capture running. Every step and every failure got narrated. The output was a twenty-three-step guide with screenshots of the actual current setup.

The guide replaced the README. New hires opened it on day one and worked through it. The six known failure modes got their own short troubleshooting guides, linked from the main one.

When a step changes (a tool upgrade, a new env var), the affected step gets re-recorded. Two minutes of work, not a README rewrite. Onboarding-related senior-engineer DMs dropped from six per new hire in week one to about one. That one is usually genuinely interesting.

Onboarding pipeline from clone to first PR, isometric brutalist diagram

03

  1. 01
    Do the setup live.

    Fresh laptop, Capture running. Narrate every step including the failures.

  2. 02
    Make the failures first-class.

    Each known failure mode gets its own short troubleshooting guide.

  3. 03
    Linked from one place.

    The engineering wiki has one entry: Start here.

  4. 04
    Maintain step by step.

    Re-record the affected step when a tool changes. Not the whole guide.

  5. 05
    Track first-PR time.

    Onboarding success is measured by time to first shipped PR.

04

Time-to-first-PR fell from three weeks to one. Senior engineers got their week-one back. Onboarding-related DM volume dropped roughly 80%.

The pattern spread. The on-call runbook got the same treatment. The PR review process got recorded. The deploy flow got recorded. The engineering wiki is now twelve guides, not 2,400 lines of mostly-stale README.

Time-to-first-PR distribution before/after, schematic style
Try it

Record one workflow.

Free Chrome extension. No signup required.