Runbooks are calm until the pager goes off. Then the small gaps become loud. The dashboard name changed. The rollback command is missing a flag. The person who remembers the vendor escalation path is asleep. The postmortem later says "update documentation," and everyone agrees, and the runbook quietly drifts again.

DevOps documentation works best when it is close to the work: incident timelines, commands, dashboards, deploy notes, follow-up decisions, and the postmortem that explains what changed. Docapybara gives that material a searchable vault, with Capy available to pull context together when you need it.

This guide is not about replacing your alerting, status page, ticket tracker, or incident tool. Keep those. Use Docapybara for the operational memory around them.

## Write runbooks for the tired operator

A runbook should assume the reader is interrupted, under time pressure, and not excited to solve a puzzle. Put the first useful action near the top. Include what the runbook covers, when to use it, when not to use it, and the safest initial checks.

Use plain headings: `Symptoms`, `First checks`, `Mitigation`, `Rollback`, `Escalation`, `Aftercare`, `Links`. Keep commands in code blocks. Add notes about expected output, not just the command itself.

If a runbook starts looking like a process document, [Standard Operating Procedures, Without the Wiki Maintenance Tax](/guides/field-service-ops/ai-notes-standard-operating-procedures/) is a useful companion. The same maintenance problem shows up in both places.

One useful test: hand the runbook to someone who has not touched the service recently and ask what they would do first. If they hesitate at the first screen, the runbook is asking for too much background knowledge. Add the missing context while it is still obvious.

## Keep incident timelines as raw material

During an incident, capture the timeline without trying to write the postmortem. Timestamped notes are enough: alert fired, first check, hypothesis, mitigation, customer impact, vendor contact, deploy, recovery, follow-up.

If a call happens, record it when appropriate and keep the transcript with speaker labels. If screenshots, logs, or PDFs matter, drop them into the vault. Uploaded PDFs are converted to markdown so Capy can search them later.

The timeline is raw material. Do not polish it while the incident is still active. The goal is to preserve enough evidence that the postmortem does not rely on adrenaline memory.

After the incident, keep the raw timeline even if the postmortem becomes cleaner. The rough version often contains useful details that do not belong in the final writeup: a command that almost helped, a dashboard that misled you, or a vendor response time that shaped the mitigation.

## Ask Capy what changed since last time

When an incident resembles a previous one, ask Capy to search the vault before you reinvent the response. "Find past incidents involving checkout timeouts." "Show runbooks that mention the import queue." "What follow-up items came out of the last database failover?"

This can surface old mitigations, known bad paths, vendor details, and previous decisions. You still make the operational call. Capy just lowers the cost of remembering.

For bug-heavy incidents, connect the incident page to [How to Use AI Notes for Bug Triage and Technical Debt](/guides/developers-builders/bug-triage-technical-debt/). The boundary between a production incident and a recurring bug is often thinner than the tool names suggest.

## Turn postmortems into linked updates

A postmortem should not be a document that ends the conversation. It should update the operating system around the incident. Link the postmortem to the runbook, the bug note, the code review note, the dashboard, and any ADR that changed because of it.

Use a simple postmortem shape: summary, impact, timeline, contributing factors, what went well, what was hard, follow-up decisions, owners. Avoid theatrical blame and avoid pretending every cause is equally knowable. The job is to make future response easier.

When the postmortem identifies a process change, update the runbook in the same sitting if you can. If not, create a follow-up item with owner and date. A postmortem without linked changes becomes a commemorative plaque.

Small postmortems are fine. A fifteen-line note that names the real follow-up is better than a polished document that arrives after everyone has moved on.

## Track follow-ups where they can be reviewed

Operational follow-ups need a home. An inline database works well: date, incident, follow-up, owner, status, linked note, and review date. Put it on an `Operational follow-ups` page or inside the incident hub.

This database should live beside the prose that explains how you review it. Docapybara supports inline databases inside markdown pages, so the table can sit right under the operating notes instead of becoming another destination.

Ask Capy before review: "Summarize open operational follow-ups and group them by risk area." Then use human judgment to decide what actually moves. The agent can gather. You decide.

## Preserve deploy and rollback notes from reviews

Many runbook updates originate in code review. A reviewer asks about rollback. Someone explains the deploy order. A migration has a risky backfill. If those details stay only in the pull request, they are hard to find during the next alert.

When a change affects operations, copy the durable part into the runbook or link to a review note. [Code Review Documentation That Outlives the Pull Request](/guides/developers-builders/code-review-documentation/) covers the review side of this workflow.

The practical question is: if this deploy wakes someone up, what would they need to know? Put that sentence where the operator will look, not only where the reviewer once saw it.

## Keep operational decisions explicit

Some operations choices are really architecture decisions: where state lives, how retries work, what the failure mode should be, which vendor is authoritative, how long to keep a queue item alive. Write those decisions down.

An ADR does not need to be grand. It can say: context, decision, consequences. The important part is making the reason searchable before someone changes the behavior in a well-meaning cleanup.

Use [Architecture Decision Records, Kept Where Your Agent Can Read Them](/guides/developers-builders/architecture-decision-records-ai-notes/) for the decision format. Then link the ADR from the runbook and the postmortem so Capy can follow the chain.

## Keep the runbook shelf current

Schedule a light runbook review after meaningful incidents and before planned high-risk work. Ask Capy to compare the latest incident notes against the relevant runbook and list mismatches. Treat the output as a checklist, not a verdict.

The shelf does not need to be perfect. It needs to be trusted enough that someone opens it under pressure. That trust comes from small updates made close to the work.

For a broader agent-in-documents view, see [Claude Code for Documents](/blog/claude-code-for-documents/). Try Docapybara free at [the signup page](/accounts/signup/) if your runbooks and postmortems keep living in different places. Start with one service, one runbook, and the last incident you still remember clearly. Let the first pass be plain; the important part is putting the operational memory where Capy can find it before the next alert.