A research swipe file is the running collection of source material you save because you might need it. Articles, papers, reports, interviews, screenshots, statistics, quotes. Most people have one. Most people never look at it again. The folder of bookmarks, the Pocket queue, the Drive folder labeled "research" — they're functionally a write-only archive. Saved, never retrieved.

This guide is about building a research swipe file you can actually retrieve from. The same shape works whether you're researching a book, prepping a quarterly report, building a market analysis, or supporting a long-running content operation. The capture is easy. The retrieval is the actual job.

## Why most research files become useless

The standard failure mode is consistent. You save a great article in week one. By week eight, you've saved sixty more. By week sixteen, the file has two hundred entries and you've stopped opening it because the cost of finding the right article exceeds the benefit of having saved it.

The root cause is that the file holds the source but not your reason for saving it. Six months later, you remember vaguely there was an article about something — but not what made it useful, what claim it supported, or which project it might serve. So you re-search Google instead of using the file you built.

The fix has three parts. Capture has to include your own commentary, even just two lines. The file has to be searchable as text, not just by title. And the agent has to be able to read across the whole file when you ask a question, so retrieval becomes "what do I know about X?" instead of "what did I save about X?"

The shape that holds:

- **A capture habit** with mandatory two-line commentary at the moment of saving.
- **PDFs and articles converted to searchable text** so the actual source material is readable, not just the titles.
- **A theme structure** that maps each source to the projects or threads it might serve.
- **An agent that reads across the whole file** when you ask retrieval questions.

## The capture pattern — two lines, every time

The discipline that separates a useful research file from a write-only one is the two-line rule. When you save something, you write two lines about it.

What it is. Why you saved it.

That's it. Thirty seconds. The friction has to be that low or you won't do it consistently. But the two lines do enormous work later. They're the difference between "I remember saving an article about regulatory capture" and "I have a 2024 piece by Susan Athey on regulatory capture in financial markets, saved because the section on enforcement asymmetry connects to the antitrust thread."

The two lines should be specific. "Useful" is not specific. "Has the data on user retention by cohort that I'd need for the proposal" is specific. "Interesting" is not specific. "Pushes back on the standard view of network effects with a counterexample I want to address" is specific.

For PDFs of papers or longer reads, drop the PDF on the entry's page. It auto-converts to markdown via docstrange so the paper becomes searchable text. The two-line commentary lives at the top; the source material lives below it.

## The theme structure — what the file is actually for

A research file works better when it's organized by theme rather than by source type. You don't usually need "all the PDFs I saved." You need "everything I have on a particular question."

A "Themes" sub-vault holds the themes you're actively researching. Each theme is a page. Each saved source gets tagged or linked to the theme it serves. For sources that touch multiple themes, link them to all the relevant pages.

When you start a new project — a report, an article, a presentation — open the relevant theme page first. The agent reads across everything tagged to that theme: "From everything in the customer retention theme, pull the strongest sources on the cohort analysis question. For each, summarize what it claims and how it might support or push back on my argument."

You get a working synthesis grounded in your own saved material. Not a generic literature review — a custom one based on the sources you've curated.

For long-running practices that build up the same themes over years, the theme structure compounds. The book you're writing in 2027 will draw from sources you started saving in 2024. Without the structure, those sources are unfindable. With it, they're a real asset.

## The retrieval pass — questions, not folders

The shift that makes the file useful is moving from folder-based retrieval to question-based retrieval. Instead of opening the right folder and scanning, you ask a question and the agent answers from across the file.

Examples of the questions:

- "What do I have on the question of why mid-market customer success teams need a different motion than enterprise?"
- "Pull every source I've saved that includes data on user retention by acquisition channel."
- "Find the article I saved last spring about the Federal Reserve's communication strategy. Summarize it and pull the most quotable sentence."
- "I'm writing a section on network effects with counterexamples. What do I have that's relevant?"

The agent reads across the file and answers. Some of the answers are wrong; you check them against the source. Most are correct enough to be a starting point. The retrieval cost has dropped from twenty minutes of folder scanning to thirty seconds of asking and reading.

For specific quotes or statistics you need to verify before citing, the agent can pull the relevant passage with a page reference: "In the Athey paper, find the section on enforcement asymmetry. Pull the relevant paragraphs with page numbers." You get the citation back ready to verify against the original.

## When the source is audio or video

Research isn't just text. Conferences talks, podcast episodes, interviews — all of it is potentially research material, and most of it dies in a Pocket queue you'll never revisit.

The pattern: download the audio (or save the video and extract audio). Drop the file on a page in your vault. The transcription runs with speaker labels. The talk that was an hour of audio becomes searchable text the agent can read.

Now the talk is part of the research file. "Pull every research source — text or audio — that mentions the question of regulatory capture. List the audio sources with the timestamp where the speaker addresses it." You get a unified result across modalities, not a separate search per source type.

For longer research projects involving many recorded sources — qualitative research interviews, customer discovery calls, stakeholder conversations — the same workflow scales. Drop the audio. Add two lines of commentary. Tag to the relevant theme. The transcript becomes part of the searchable file. The same audio-as-research pattern shows up in [AI notes for journalists](/guides/creatives-content/journalists-sources-research/) — same mechanic, different application.

## The duplicate problem and the seeing-it-twice signal

A real practice over months: the same source shows up twice. You saved an article in March, forgot you saved it, saved it again in August. Most people see this as a failure of the system. Treat it as data instead.

When you find a duplicate, that's a signal — the source is unusually relevant to whatever you've been thinking about. Either it touches multiple themes (in which case, link it to all of them) or your interest in it has shifted (in which case, update the commentary on both entries to reflect what changed).

The agent helps surface the duplicates: "Read across the research file. Find sources that are likely duplicates — same author, similar title, similar topic." You get a list. You merge or annotate. The file gets cleaner without you spending an afternoon on it.

## The pruning pass — once a quarter

A research file gets stronger when you prune it. Not by deleting sources, but by reviewing them and updating the commentary. The article you saved last spring with one line of commentary is probably worth a fuller note now that you've thought more about the topic.

Once a quarter, block an hour. Open the theme pages. Scan the entries. For each one, ask: is the commentary still accurate? Is the tagging still right? Is there a better source you've since found that this one points toward?

The agent helps with the structural pass: "For each theme page, list the entries that haven't been touched in the past six months. Suggest which ones might be candidates for archiving and which ones might need updated commentary based on more recent sources."

You get a working list. You spend the hour updating, not deciding what to delete. The file stays alive instead of becoming a museum.

## The shareable version — when research becomes the deliverable

For a lot of research work, the file is internal scaffolding for a deliverable. Sometimes the file itself is the deliverable — a literature review for a client, a research memo for a team, a state-of-the-field document.

The vault makes the shareable version easy. Open the relevant theme page. Ask the agent: "From the entries on this theme page, draft a literature review covering the strongest sources, organized by sub-question. Include direct quotes where they're load-bearing for the argument. Cite each source with the title and date." You get a draft. You edit it into the deliverable.

The work that used to be a week of assembly compresses. The deliverable is grounded in real sources you've actually read, not in an AI's invented references. The verification step is still on you, but the assembly part is no longer the bottleneck.

For drafting the surrounding writing once the research is in place, see [how to draft emails, proposals, and newsletters inside your notes app](/guides/creatives-content/draft-emails-proposals-in-notes/) for the broader writing-in-the-vault workflow.

## A boundary on what AI should and shouldn't do

A practical note: the agent reads what you've saved and what you've written about it. It doesn't validate that the source is accurate, that your reading of it is correct, or that the citation is real. The verification work — going back to the source, reading the passage in context, checking that the quote isn't taken out of context — is still on you.

This matters most when the agent surfaces a source you don't fully remember. The two-line commentary you wrote in March may not still be accurate now. Re-read the source before quoting it. The vault makes the source easier to find. It doesn't certify the source's truth.

For the broader pattern of building reference material into a working asset, see [how to build a swipe file in your notes app](/guides/creatives-content/build-swipe-file-notes/) for the version focused on creative reference rather than research.

## A calmer way to research

A research file doesn't have to be a graveyard. The graveyard happens when the capture is mechanical, the file isn't searchable as text, and the retrieval has to happen by folder. The shape that holds is small mandatory commentary at capture, sources converted to text, tagging by theme, and an agent that reads across the file when you ask a question.

Try Docapybara free — [sign up](/accounts/signup/), build a theme page for one project you're researching, drop in five sources you've saved with two lines of commentary on each, and ask the agent for a synthesis grounded in those sources.