3 July 2025

The Semantic Layer Between Humans and Machines

By Asgeir Albretsen5 min read

structured-dataknowledge-baseaimarkdown

Not code, not prose. The case for structured Markdown blocks as the grammar for human-AI collaboration.

In 2004, Chris Lattner published his PhD thesis describing a new kind of intermediate representation for compilers, which he called LLVM IR. The idea was simple and strange: stop trying to translate directly from C to machine code. Put an abstraction layer in the middle. Something that isn't quite a programming language and isn't quite CPU instructions — a form that can be reasoned about, optimized, and retargeted to any architecture without touching the original source.

It worked extraordinarily well. Twenty years later, most of the languages you use — Rust, Swift, Clang C++, Julia — compile through LLVM. The intermediate representation turned out to be the right grammar for talking about programs without committing to either the human-facing side or the machine-facing side.

I've been thinking about whether documents need the same thing.

The problem with both extremes

Natural language is great for humans and nearly useless for machines. A paragraph that says "Kari prefers direct communication and usually responds better to written context before a call" is perfectly meaningful to anyone who reads it. Ask software to extract a structured fact from that sentence — one you can query, index, or pass to an agent — and you're back to parsing heuristics, hoping the model infers correctly.

Databases are the opposite. A row in a preferences table that says {topic: "communication_style", value: "direct", confidence: "high"} is completely queryable. You can filter it, join it, sort by confidence, pass it as clean context to any AI. But no one writes in a database. Nobody describes their thinking in SQL. The moment you move knowledge into a table, you lose the surrounding prose that made it meaningful — who told you, why it matters, what the exceptions are.

This is not a new problem. Carsten Dominik ran into it in 2003 when he built org-mode for Emacs. His insight was that you could embed structured data — task states, deadlines, table rows, metadata — directly inside plain text documents, using a lightweight syntax that was still readable without any tools. Org-mode files look like documents when you read them and behave like databases when you query them. For two decades, a devoted group of developers has built their entire knowledge system on top of this idea.

The rest of the world mostly ignored it. Org-mode required Emacs, which required a certain kind of person. The tooling never arrived for everyone else.

What a structured block actually is

The concept isn't complicated. A structured block is a typed, named snippet embedded inside a document, adjacent to the prose that gives it context:

:::preference
topic: communication_style
value: prefers direct, written context before calls
source: user note
confidence: high
:::

A human reads the surrounding document and understands it. Software reads the block and indexes it. Neither has to sacrifice what it needs.

The dual representation is the point. You write about a person in flowing prose — where you met them, what you learned, what you should remember. Embedded in the same document, a structured block holds the machine-readable version of the same knowledge. The document stays readable. The knowledge stays queryable. They don't drift apart because they live in the same file.

This is different from frontmatter, which is metadata about the document. A structured block is metadata inside the document, attached to specific content, with its own schema and lifecycle.

Why this matters when AI is involved

The way most AI tools handle personal knowledge right now is fragile. You describe something in natural language, the model extracts structured facts from it, those facts are stored somewhere you can't inspect, and if the extraction was wrong — or the model forgets — there's no obvious place to look or correct.

The alternative is giving the model a structured source of truth you maintain. Not a note that says "I think Kari prefers short meetings," but a preference record with a topic, a value, a source, a confidence level. The model doesn't have to infer. It reads the record.

This changes the quality of AI responses in ways that compound over time. The difference between an AI working from preferred_meeting_length: short, source: observed and one working from a blob of narrative text is the difference between a structured query and an educated guess. The structured version is also auditable: you can look at it, disagree with it, edit it.

If the knowledge lives in blocks inside documents you own, you're in control. If it lives inside a model's hidden memory or a cloud database you can't open in a text editor, you're trusting something you can't inspect.

The compiler analogy, completed

What made LLVM genuinely powerful wasn't the IR itself — it was that the IR became a shared contract. The front-end didn't need to understand the target architecture. The back-end didn't need to understand C. The IR was the handshake between them.

Structured Markdown blocks want to be the same thing for documents and AI. The human writes and edits in a format that feels like writing. The AI reads and reasons over a format that feels like data. Neither side has to compromise because the intermediate layer handles the translation.

Dominik's org-mode insight was correct in 2003. The format just needed better tooling — something that didn't require Emacs, something that synced across devices, something that made the structured layer visible and editable rather than buried in a proprietary system nobody can open.

The IR still needs a compiler. But the format is not new, and we know it works.

Asgeir Albretsen is the founder of Harbor.

← All posts