Data work creates more context than the final notebook shows. There are dataset quirks, feature choices, false starts, evaluation caveats, meeting notes, model cards, stakeholder questions, and the small sentence in a paper that explains why you changed direction.

When that context stays scattered, the next experiment starts too cold. You rerun something already tried. You compare results without remembering the data split. You keep a model because the metric improved, then later realize the input changed under it.

Docapybara is not a replacement for your training platform, notebook environment, or model registry. It is the searchable workspace around them: the place where experiment notes, source documents, decisions, and follow-up questions live in plain markdown, with Capy available to search and synthesize when you ask.

## Start with the research question

Every experiment should point back to a question. "Can we reduce false positives on vendor invoices without losing recall?" is better than "try model v3." "Does adding support-ticket text improve churn prediction?" is better than "new feature set."

Put the question at the top of the project page. Under it, write what would count as useful evidence, what constraints matter, and what you are not trying to solve yet. This keeps the work from drifting into a parade of runs with no interpretation.

If your work involves evaluating a product, company, or data room, [AI Notes for Technical Due Diligence](/guides/developers-builders/technical-due-diligence-notes/) uses a similar evidence-first structure.

## Keep dataset notes close to experiment notes

Dataset details are easy to treat as background, but they often explain the result. Keep a dataset note for each important source. Include owner, refresh cadence, filters, known gaps, labeling rules, privacy constraints you are responsible for honoring, and examples of rows that are weird in useful ways.

When you run an experiment, link to the dataset notes it depends on. If the dataset changes, update the note and mark which experiments used the old version. You do not need a heavy system to get value from this. A dated note with links beats a perfect registry nobody reads.

Uploaded PDFs, papers, and exported reports can live in the same vault. Docapybara converts PDFs to markdown so Capy can search them as text. That makes it easier to ask, "Which papers mentioned calibration for this kind of classifier?" without manually reopening every file.

## Track runs with a small inline database

Use an inline database via the `:::database:::` directive for experiment runs. Keep the columns practical: run date, question, dataset version, model or method, key parameters, metric summary, result link, status, and next step.

The database should not replace your tooling. It should give you a readable map. Your training platform can hold artifacts and logs. The Docapybara page explains what each run was trying to learn and what you think the result means.

For engineering teams, this resembles the reproduction table in [How to Use AI Notes for Bug Triage and Technical Debt](/guides/developers-builders/bug-triage-technical-debt/). Both workflows benefit from writing down the attempt, the changed variable, and the observed result.

## Write interpretation separately from metrics

Metrics are not interpretation. Keep them near the work, but don't let them be the whole note. After each important run, write a short interpretation: what changed, what improved, what got worse, what might be confounded, and what you would try next.

This is where calm language matters. Avoid declaring a winner too early. Say "This run improved validation precision, but the labeling sample may overrepresent enterprise accounts." Say "The metric moved, but the error examples look worse for small customers." Those sentences are not glamorous. They are useful.

Capy can help by reading a run note and asking for missing interpretation fields. It can also compare two notes and list what changed between them, which is often the exact thing your tired brain does not want to reconstruct.

## Preserve failed experiments

Failed experiments are part of the map. Do not delete them just because they are messy. A failed run may explain why a future approach is risky, why an apparently obvious feature was rejected, or why a stakeholder request is harder than it sounds.

Use statuses like `Promising`, `Inconclusive`, `Rejected`, and `Needs rerun`. Add one sentence to rejected runs explaining why. "Rejected because improved aggregate score came from overfitting high-volume customers" is a gift to future-you.

This habit also helps when a model decision becomes an architecture decision. If the serving path, data contract, or retry behavior changes because of an experiment, connect the final choice to [Architecture Decision Records, Kept Where Your Agent Can Read Them](/guides/developers-builders/architecture-decision-records-ai-notes/).

## Use Capy for synthesis, not silent authority

Capy is useful for summarizing a set of experiment notes, finding related dataset caveats, drafting a model-readiness memo, or comparing the stated question against the actual runs. It should not become an unquestioned judge of model quality.

Ask grounded questions. "Summarize the last five churn experiments and separate observed results from interpretation." "Find notes that mention calibration, class imbalance, or threshold tuning." "Draft a stakeholder update using only the linked experiment notes."

Because the agent works from your vault, its output is easier to inspect. You can follow links back to the notes and decide what to trust. For more on using notes as model context, see [Using AI Notes as Context for Claude, ChatGPT, and Other AI Tools](/guides/developers-builders/notes-as-context-for-ai-tools/).

## Turn results into decisions people can read

At some point, an experiment becomes a decision: ship the model, hold it, collect more labels, change the target, simplify the method, or stop the project. Write that decision in plain English.

Include the question, the evidence considered, the chosen path, the risks, and the follow-up date. If the decision affects engineering, link to review notes or API docs. [How to Use AI Notes for Code Review Documentation](/guides/developers-builders/code-review-documentation/) is useful when the model work becomes a change someone has to review and maintain.

This decision note is especially important for non-data stakeholders. They may not need every run. They do need to know what you concluded and why.

## Keep the notebook and the narrative connected

The notebook is where you compute. The vault is where you remember. Keep links between them. A notebook cell can link to the experiment note. The experiment note can link to the artifact, chart, paper, data source, and decision.

This gives Capy enough context to help without pretending the vault is your whole data stack. It can read the narrative, search the supporting notes, and draft the next memo. Your specialized tools keep doing the specialized work.

Try Docapybara free at [the signup page](/accounts/signup/) if your experiment history is split between notebooks, chats, PDFs, and memory. Start with one active model question, create the project page, write down the next run before you run it, and leave yourself enough context to trust the result later.