17 March 2025

Publishing Your Knowledge Base to AI (Safely)

By Asgeir Albretsen5 min read

mcpknowledge-baseprivacyai-agents

A public MCP endpoint isn't just a developer curiosity. It's a fundamentally different way of sharing knowledge — not with human readers, but with AI systems that query it by permission.

Somewhere in 2016, Tim Berners-Lee launched a project at MIT called Solid. The idea was simple and radical: instead of your data living on Facebook's servers or Google's servers, it would live in a "pod" — a personal online data store you controlled. Apps would connect to your pod with permission, read what they were allowed to read, and write nothing you hadn't approved. The web would stop being a machine for extracting your data and start being a machine for working with it.

Solid never took off. The ecosystem didn't follow. The chicken-and-egg problem of personal data stores — nobody builds for a platform nobody uses — proved fatal. But Berners-Lee got the underlying structure right. He just built it for the wrong kind of client.

The right kind of client turns out to be AI.

What publishing to AI looks like

When most people think about giving an AI access to their knowledge, they're imagining one of two things: pasting something into a chat window, or turning on a memory toggle inside the AI app itself. Both of these give the AI your information. Neither of them lets you define what the AI can query, on what terms, with what constraints.

A public MCP endpoint is different. Anthropic released the Model Context Protocol in November 2024, and the surface-level story was "AI assistants can now talk to external tools." That's true. But the deeper story is about what it means to publish a typed interface to your knowledge — not to human readers, but to AI systems acting on behalf of human readers.

Here's a concrete version of what this looks like. You ask Claude something like "who's my main point of contact at Acme Corp?" Claude doesn't have this information from training. It doesn't have it from a system prompt. But you've connected your knowledge base via an MCP endpoint, and the people.find tool knows the answer: you added a person record for that contact six months ago, tagged to a project, with a note about how you prefer to reach them. Claude calls the tool, gets a structured response, and answers. That's not retrieval-augmented generation in the sense most people use the phrase. It's a query.

The permission question is not optional

The obvious failure mode here is a system that hands the AI your full credentials. If an AI client inherits all your access, it can see everything you can see. That's the "inherited permissions" problem, and it appears throughout enterprise AI deployments.

The more useful design is granular scopes. A writing assistant probably needs access to your Ideas/ folder and your active Projects. It almost certainly doesn't need to see your personal journal. A scheduling assistant might need to read your tasks and preferences, but shouldn't be able to modify your contact records. These distinctions become visible — and enforceable — only when your knowledge base exposes a typed interface rather than a raw data dump.

This is the part of the Solid design that translates directly. Berners-Lee's pods had fine-grained access controls. So should yours.

In practice, this means publishing an MCP endpoint involves actively deciding what to publish. Which folders are in scope? Which tools are available — read-only search, or structured queries, or proposed writes that require your approval? You're not sharing content. You're defining a surface.

Why this inverts the normal model

Every major AI product stores your context inside itself. ChatGPT memory, Claude Projects, Gemini Personalization — the pattern is the same. Your data migrates into their system, and their AI has access to it. The exchange is convenience for custody.

Publishing an MCP endpoint from your own knowledge base is the inversion of that. The AI doesn't hold your context. It connects to yours. You keep the data. You define what's accessible. You revoke access when you want.

OAuth 2.1 became the standard for remote MCP authentication by mid-2025. Session-scoped authorization — where access expires at the end of a task rather than persisting indefinitely — is an emerging pattern that makes this even safer. The AI gets what it needs for the session. Nothing carries over.

None of this requires the knowledge base to be public. "Publish" is doing work in this sentence: it means making an interface available to authorized clients, not making content visible to anyone. Your journal entries don't move. Your person records don't move. A scoped capability to query them gets shared, and only to the AI clients you've authenticated.

What it forces you to get right

The act of deciding what to publish turns out to be useful in itself. It forces precision about what different AI tools actually need from you. A meeting prep assistant needs your contact records and recent project notes. A research assistant needs your ideas and reference documents. A task manager needs your tasks and preferences.

When your knowledge base is a blob of text in a system prompt, you tend to throw everything in. When it's a typed API with per-folder permissions, you have to think. That constraint is not a bug.

Berners-Lee imagined personal data pods for a web where people wanted sovereignty over their social graph. The need was real; the timing was early. Now, with MCP at 97 million monthly SDK downloads as of March 2026 and supported by every major AI vendor, the same architecture is worth building — not because users want to be their own server administrator, but because AI clients are finally capable enough to make a queryable personal knowledge base worth having.

The protocol exists. The permission model exists. Berners-Lee got his pods eventually — just not the ones he designed.

Asgeir Albretsen is the founder of Harbor.

← All posts