19 March 2025

The Word You Used Then

By Asgeir Albretsen5 min read

knowledge-managementretrievalsearchnotes

Your notes are hard to find not because they're disorganized, but because past-you wrote in words that present-you doesn't think in.

You search for "meeting notes from the product discussion." Nothing. You try "product feedback synthesis." Still nothing. You find it, eventually. The document is called "Thoughts after talking to Lars."

That's not a filing problem. It's a vocabulary problem — and it turns out to be far harder to fix than rearranging folders.

In 1987, George Furnas and Thomas Landauer at Bell Labs published a study in Communications of the ACM on what they called the vocabulary problem. They asked people to name objects and concepts in everyday domains — computer commands, physical items, everyday actions — then measured how often two people independently chose the same word. The answer: 10 to 20 percent of the time. Eighty percent of the time, two people describing the same thing pick completely different words. Their framing was about software interface design: if you call a command "delete" but the user types "remove," they'll never find it. But buried in the implication was something stranger, something that didn't become fully relevant until personal software caught up.

The person you're most likely to have a vocabulary mismatch with, when searching your own notes, is yourself.

Your vocabulary drifts

The note you wrote in 2021 was written in 2021 language. The project wasn't yet called "the infrastructure migration" — it was "the backend thing." The person you wrote about wasn't yet "a strategic partner" — they were someone you'd grabbed lunch with. The decision you recorded wasn't a "pricing model change" — it was a question you needed to think about more.

In 2025, you search using the terms that crystallized later. The canonical language that emerged over months of meetings, documents, and decisions that came after. Past-you didn't have those terms. Past-you was still in the middle of figuring out what the thing was.

This is personal semantic drift. Not inconsistency, exactly — consistency with the version of you who existed then. A note faithfully records the mental vocabulary of the moment it was written. Search assumes the vocabulary of right now.

Zhao and Callan, in a 2010 study on information retrieval, found that query terms fail to appear in 30 to 40 percent of documents genuinely relevant to a query. That was in a professional corpus, where writers were trying to be precise and findable. Personal notes, written as thinking artifacts for no specific future audience, are almost certainly worse.

Why tags don't solve this

The standard advice is to tag things carefully. The problem is that tags encode your vocabulary of the moment you tagged them. "Interesting" means nothing in a year. "Process improvement" tagged in 2021 won't match "ops efficiency" searched in 2023. "Project X" made sense when Project X had a name; it stopped making sense when the project became something else.

Folders have the same structural weakness. The folder called "Q3 2022" tells you when, not what. The folder called "Random" is called "Random" because you couldn't decide at the time — which is almost always when note-taking happens. Even thoughtfully named folders like "Customer Research" encode the label-as-understood-then, not the question-as-asked-now.

This isn't a discipline problem. Good organization requires knowing what something is in a settled, finished way. Most notes are written before that's possible. You write them in the middle of thinking, when the thing is still becoming.

What structure actually changes

Semantic search — finding by meaning rather than by exact string — directly addresses the vocabulary problem. If you search for "pricing model change" and past-you wrote "how should we charge for this," those are the same idea, and a vector embedding can see it. The 25 to 35 percent retrieval improvement semantic search shows over pure keyword matching isn't because keyword search is badly implemented. The vocabulary gap is structural. You can't keyword-search your way around it.

But semantic search alone is fuzzy by design. It's good at "find things like this" and weaker at "find the specific thing I know exists." For personal knowledge — where you often have a strong sense that something is there — you need both modes.

The more durable fix is structural. When a note about a person is linked to a typed person record; when a decision is tagged with a project and a date as a first-class field rather than just text inside the document; when a preference is explicitly stored as a preference — the vocabulary problem shrinks. You're not searching by the words you happened to use. You're querying by what something is. The relationship doesn't degrade when your language changes.

Furnas and his colleagues proposed "unlimited aliasing" as the practical solution: build in every possible word a user might choose, because users will always surprise you. The personal knowledge equivalent is simpler. Build structure that doesn't depend on your vocabulary staying stable.

It won't.

Asgeir Albretsen is the founder of Harbor.

← All posts