You've been studying Spanish on and off for a year. You have a deck of flashcards on one app, a notebook of sentences from your tutor, screenshots of grammar explanations from three websites, voice memos of your tutor explaining the subjunctive that you haven't listened to since you recorded them, and a Notes app full of "verbs to look up later". The next tutor session is on Thursday, and you have no idea what you actually know versus what you're still confused about.
Most language learners run into this version of the problem. The materials are everywhere. The act of learning happens in fragments — a podcast on the commute, a tutor session, a movie night with subtitles, a half-hour with the textbook. The fragments don't talk to each other, so the same word you "learned" three months ago shows up again and you can't remember it.
A vault that holds your active vocabulary, your sentence patterns, your tutor logs, and your voice practice — with an agent that can find what you already know and quiz you on it — fixes most of it.
One language, one parent page, with the practice underneath
In Docapybara, each language you're studying gets a parent page. Pages nest with no depth limit, so under each language you can have child pages for Vocabulary, Grammar patterns, Conversations and tutor sessions, Voice practice, Reading log, Listening log, and Things to ask the tutor.
For people studying multiple languages, group them under a Languages parent. Each language keeps its own scope; the agent can search across when patterns overlap (like the romance languages, or anything in the Slavic family), but you mostly stay inside one language at a time.
For the broader habit of learning anything new, see AI Notes for Learning a New Skill: One Vault Instead of Five Apps — language learning is one of the most demanding versions of the same shape.
A vocabulary database that survives the long haul
The vocabulary section is where most learners over-engineer and under-deliver. Spaced-repetition apps optimise the review math beautifully, but they're black boxes — you can't easily see your own learning history, you can't ask questions about what you've struggled with, and exporting your data is usually painful.
A simpler version that scales: an inline database via the :::database::: directive on your Vocabulary page. Columns for word, part of speech, English meaning, example sentence (in the target language, ideally one you've actually heard or used), source (where you encountered it), date added, and notes (gender, irregular forms, false friends, anything to remember).
The agent can update this from voice or text. "Add the word [X] — verb, means [Y], heard it in the podcast about [topic], example sentence [Z]." Row appears.
The agent can quiz you on what you've added. "Quiz me on ten verbs from my Spanish vocabulary that I added more than two weeks ago — give me the meaning, ask me for the word." You get a session that's actually grounded in what you've been learning, not a generic frequency list.
For the broader version of building knowledge that survives — across any subject — see How to Build a Personal Knowledge Wiki Without Trying. The vocabulary database is one of the most concrete examples of the wiki principle.
Grammar patterns captured as you encounter them
The grammar mistake most learners make is studying a chapter on the subjunctive, taking notes, and then forgetting it because there's no system for noticing it in the wild. The fix is capturing patterns as you encounter them, with examples from the actual material you're consuming.
A Grammar patterns page (or one child page per major pattern — Subjunctive, Por vs. para, Past tenses, Pronoun placement) holds the rule explanation in your own words plus a growing list of real examples you've seen.
When the tutor explains something during a session, voice-record the explanation. The transcript drops on the relevant grammar page. "Tutor explained today that the subjunctive after [trigger phrase] is required because [reason]. Example she gave was [sentence]." Three months later when you're about to use that construction, the explanation in your tutor's actual words is on the page.
When you encounter the pattern in the wild — in a book, a podcast, a movie — drop the example on the page with a note about the source. "Heard this construction in the [show] episode about [topic], spoken by [character]." The agent can pull all your real-world examples of a given pattern. "Show me every example I've collected of the subjunctive in real-world Spanish."
Voice practice with transcripts and feedback
Voice is the part of language learning that most learners under-practice because it's awkward. Recording yourself speaking the language feels worse than recording in your native language. Most people skip it, and it's usually the part of the language that lags furthest behind.
Audio recording in Docapybara handles this. Tap record, speak in the target language for a minute or two on a topic, tap stop. You get a transcript with speaker labels (just you, in this case). The transcript is searchable and the agent can read it.
Useful prompts for self-practice:
- "Describe what you did this morning, in [language], for two minutes."
- "Argue both sides of [topic] in [language]."
- "Tell the story of [book or movie you saw] in [language]."
After the recording, you can ask the agent to review. "Review the recording I just made — flag grammar mistakes, vocabulary I used incorrectly, and natural-sounding alternatives to phrasings that sound textbook-ish." You get a breakdown grounded in what you actually said.
The agent can also pull patterns over time. "Look at my voice practice from the past month — which mistakes have I been making consistently?" The answer is grounded in your actual recordings, which is much more useful than a generic "common mistakes" list.
For the broader voice habit, see The Complete Guide to Voice-First Note-Taking — language learning is one of the highest-payoff applications of voice capture.
Tutor sessions that compound across months
For learners working with a tutor, the sessions are dense. New vocabulary, grammar corrections, cultural context, conversation practice, the tutor's notes on what you should focus on next. Trying to capture all of this with handwritten notes during a 50-minute session means you miss most of it.
Recording the session (with the tutor's permission) solves this. You get a transcript with speaker labels. After the session, the agent can summarize. "Summarize today's tutor session — new vocabulary introduced, grammar points she explained, mistakes she corrected, anything she said I should work on before next week." You get a clean writeup on top of the full transcript.
A Tutor sessions parent page holds one child per session, dated. Over months, the agent can read across them. "What grammar points has my tutor flagged most often as something I need to work on?" The pattern is grounded in actual sessions, not your impression.
For weekly recurring conversation partners or language exchange — the same shape applies. The conversation transcript is the practice material, and the agent can extract vocabulary you used (or struggled to use) for the active vocabulary database.
Reading and listening logs that turn input into searchable material
Most language input — podcasts, books, shows — passes through and disappears. A small amount of structure makes it stay. A Reading log and a Listening log page, with one entry per book/podcast/show you've worked through, holds the title, the date, what level you found it at, and any vocabulary or expressions worth remembering.
For audio content specifically, you can paste transcripts (when available) directly. The agent reads them as text. "Find every time the host of [podcast] used [expression] across the episodes I've logged." The pattern emerges from the actual content you've consumed.
For shows you watch with subtitles, the captions don't usually export easily, but you can voice-note an interesting line or expression as it comes up. "In episode three of [show], the character used [phrase] in a context where I'd have used [other phrase] — want to ask the tutor about the nuance." The note lands; the agent surfaces it for the next tutor prep.
Conversations with the agent in the target language
The agent itself can be a conversation partner. Open a chat in the target language and have a conversation about your day, a topic you care about, or something you're trying to express. The conversation is logged; you can review it later for vocabulary you didn't know how to use, or for patterns the agent used that you'd want to learn.
This isn't a replacement for human practice — the agent isn't a fluent native speaker with cultural context — but it's available at any hour, and the conversation log becomes more material for your active vocabulary database. "Pull from yesterday's chat in Italian the words I had to look up or didn't quite know how to use." You add them to the vocabulary database for next week's review.
A starter shape that fits a real practice
If you're starting (or restarting) language study this week:
- One parent page for the language.
- A vocabulary database with the columns above.
- A grammar patterns page (or per-pattern children) with explanations in your own words.
- A tutor sessions parent, with one child per session if you have a tutor.
- A voice practice page where weekly recordings land.
- A reading and listening log for the input you're consuming.
That's it. No flashcard system to maintain, no streak to break. The vault grows as your practice does; the agent finds the patterns.
The point isn't to over-engineer the study. It's that the small amount of structure you keep means the language you're learning becomes a body of material you've actually accumulated — your own examples, your own corrections, your own progress — instead of a series of fragments that don't compound.
Try Docapybara free — start with the vocabulary database for one language and the next tutor session, and let the practice find its rhythm from there.