The Fifty-Year Question
In 1972, researchers built the first natural language database. It worked. Then everything went wrong for fifty years — and what broke it tells you something important about personal knowledge.
In 1972, a team at Bolt Beranek and Newman built a system called LUNAR. It let lunar geologists ask questions in plain English about moon rock samples collected during the Apollo missions. "What rocks have a high aluminum concentration?" You typed it in English. You got back data.
It worked. That was the problem — it worked so well that everyone assumed the hard part was solved.
LUNAR's success launched fifty years of disappointment. The dream of asking your data questions in plain English kept reappearing, kept almost working, and kept failing in the same specific way. IBM built LanguageAccess. Microsoft shipped English Query in the late 1990s. A company called Natural Language Inc. made something called DataTalker. All of them are discontinued. The failure mode was consistent: impressive in the demo, broken in the wild.
What actually broke them
The failure wasn't what people thought. It wasn't that the systems couldn't understand English. LUNAR understood English fine — within its domain. The problem was vocabulary mismatch.
To use a natural language database interface, you had to know what words the system recognized. You had to know that "rocks" mapped to a table called specimens and "concentration" was a column called pct_weight. If you said "stones" instead of "rocks," the system failed. If you asked about "samples," maybe that worked, maybe it didn't. You had to learn the system's vocabulary. That turned out to be almost as hard as learning SQL.
Researchers called this the portability problem: building a new system for a new domain required weeks of effort teaching it the local vocabulary from scratch. The dream never scaled beyond narrow, carefully curated domains. Edgar Codd had imagined, in his original 1970 relational model paper, that ordinary users would eventually be able to query data without knowing its physical structure. He was right about the destination. The route took longer than anyone expected.
What LLMs actually fixed
Large language models broke this specific failure mode. The vocabulary problem was always about bridging between the words a human uses and the names a database uses — table names, column names, relationships. LLMs can do this mapping without being pre-programmed with synonyms. Give a model a schema — just the table definitions — and it can translate "show me all the rocks with high aluminum" into a valid SQL query with reasonable accuracy.
This is text-to-SQL, and it works. Not perfectly, not on arbitrarily complex queries, but well enough to be genuinely useful for the first time in fifty years.
But the capability comes with a constraint that gets overlooked. It only works when the underlying data has shape.
The shape problem
Imagine asking: "What did I write about Sarah's new project last month?"
If Sarah is a typed entity in your knowledge base — a person record with an ID, linked to documents and tasks — this question is answerable. The query has a target. The model can find Sarah, find everything associated with her, filter by date, and return something useful.
If Sarah is just a name that appears in some notes, the query has no reliable answer. The AI will try. It'll run a keyword search for "Sarah" and return anything that matches. Some of it will be relevant. Some won't. There's no way to filter by relationship, because the relationship isn't stored anywhere — it's inferred from proximity in text, which is fragile.
The same problem applies to "the Q3 hiring decision," "my preference for morning meetings," "the conversation where we agreed to change direction." These are things you know. They're probably written down somewhere. But if they live in unstructured prose, they exist in your knowledge base the way a word exists in a book — present, but not queryable by meaning. Full-text search finds the word "hiring." It doesn't find the decision about it.
The structure you didn't think you needed
Most note-taking tools are built around one primitive: the document. Documents have titles, timestamps, maybe tags. The content is prose. That's it.
Typed entities — person, task, preference, decision — look like an add-on. Extra complexity for people who want to build elaborate systems. The optional advanced feature for power users.
But the fifty-year history of natural language interfaces shows that structure is what makes language queries answerable in the first place. LUNAR worked because it had a schema. Every rock sample was a row, every property a column, every query had a target. The modern text-to-SQL equivalent works for the same reason. You're not querying the text of your notes. You're querying a graph of typed things, and the language model bridges between how you ask and what the database stores.
If your knowledge base is only documents, what you can ask of it is limited to what keyword search can find. Which is less than you think — because keyword search finds mentions, not meaning.
Here's the part that took me a while to sit with: the interesting move isn't technical. LUNAR worked in 1972 because it was constrained — 3,500 facts about moon rock samples, each one typed and structured. The personal knowledge equivalent of that constraint is a choice you make during capture, not a feature you turn on later. When something happens — a conversation, a decision, a new project — you can write a note about it, or you can record what it is. Those are different acts. The first produces text. The second produces a queryable fact.
The query is just the dividend. The work is choosing, when you write it down, what kind of thing you're writing down.
Asgeir Albretsen is the founder of Harbor.