Infrastructure

Build a personal MCP memory server for $0/month

Hosted AI memory products charge $20–$80 a month for what is, structurally, a key-value store with vector search.

8 min readBy

Hosted AI memory products charge $20–$80 a month for what is, structurally, a key-value store with vector search.

I'm not knocking the products. Mem0, Letta, Zep — they all do real work, especially for teams. But for a single human who just wants their AI tools to stop forgetting their preferences, the math gets weird fast. $480 a year for memory? My entire Cloudflare stack for this same product runs at $0.

The reason isn't that the hosted products are overpriced. It's that the underlying primitive — "store some rows, retrieve them by tag, embed them for semantic search" — is one of the cheapest things to run on a modern edge platform. Cloudflare Workers + D1 gives you global low-latency access to a SQLite database for essentially free at personal-scale.

So I built Context Hub on top of it.

The actual cost breakdown

Let me show you the numbers, because everyone hand-waves this and then you find out three months in that there's a $40 surprise.

Cloudflare D1 free tier (current as of 2026):

  • 5 million row reads per day
  • 100,000 row writes per day
  • 5 GB storage total
  • Unlimited databases per account

Cloudflare Workers free tier:

  • 100,000 requests per day
  • 10 ms CPU time per request (more than enough for D1 queries)

What a heavy personal workload actually looks like:

I run Context Hub across Claude.ai, Claude Code, ChatGPT, Cursor, and Perplexity. Five clients, all-day usage, ~30 active projects. My current numbers from the last 30 days:

  • ~2,400 reads/day (0.05% of free tier)
  • ~210 writes/day (0.21% of free tier)
  • 34 MB storage (0.7% of free tier)

To breach the free tier, I'd need to grow this workload by 500–1000×. That's not a single human anymore — that's a small company sharing one Context Hub. At which point you want the paid tier anyway because $5/month is fine for a team.

Why I almost didn't build this

My first instinct was to use Mem0. They have a great API. The docs are clean. The free tier is real. I signed up, prototyped for two days, and then I hit the wall every hosted-memory user eventually hits: I didn't own the rows.

What that meant in practice: I couldn't read my own memory store without going through their API. I couldn't export it cleanly. I couldn't grep it. If they ever changed pricing or deprecated a feature, I'd be doing a migration project.

For a tool I wanted to use every day, indefinitely, that felt fragile. So I started looking at what it would take to host the primitive myself. Three things had to be true:

  1. Free at personal scale. Otherwise the math is worse than just paying Mem0.
  2. Deploys in one command. If setup takes an afternoon, the friction kills adoption — including my own.
  3. Portable. If the underlying platform ever turns hostile, I should be able to move the data with one CLI command.

Cloudflare Workers + D1 hits all three.

The architecture, in one diagram-worth of prose

Cloudflare Workers runs the MCP server (HTTP + JSON-RPC) at the edge. D1 is SQLite running on Cloudflare's edge — same SQL you'd write locally, replicated globally. The MCP server exposes six capabilities to any connected AI client: save a memory, search memories, save a decision, search decisions, save a project instruction, and read all instructions. The model decides which to call based on what you said in plain English. You never see the call.

That's it. There's no microservice mesh, no message queue, no background worker. The entire backend is one Worker file with six handlers and a D1 binding.

The reason this works is that AI memory is fundamentally a read-heavy, low-write workload. You save a memory once and read it dozens of times across sessions. SQLite at the edge is shaped exactly for that.

What about embeddings and semantic search?

Reasonable question. The hosted products lean hard on vector embeddings for "semantic recall." You'd think you need that for AI memory to feel smart.

You don't, and here's why: the AI client is doing the semantic work at retrieval time, not you.

When you ask Claude "what did we decide about the rollout strategy?", Claude has already parsed your question and chosen the search keywords before reaching out to Context Hub. The server doesn't need its own embedding model — it needs to return matching rows fast. SQLite with FTS5 (full-text search) is plenty for that, and it's free, and it ships with D1.

I tried adding embedding-based search in v0.1. The improvement over FTS5 was marginal for personal-scale memory (under 10k rows), and it tripled the infrastructure complexity. Cut it.

What setup actually looks like

$ npx create-context-hub

The CLI does these things, in order:

  1. Verifies you have a Cloudflare account and the wrangler CLI authenticated. If not, it walks you through wrangler login.
  2. Creates a new D1 database named context-hub in your account. Free tier is automatic — no plan upgrade needed.
  3. Runs the schema migration to create the four tables.
  4. Deploys the MCP server to a Cloudflare Workers domain (something like context-hub.your-subdomain.workers.dev).
  5. Prints copy-paste connection instructions for Claude.ai, Claude Code, ChatGPT, Cursor, and Perplexity.

Total time on a fresh machine: about 4 minutes. The slow step is waiting for Cloudflare to provision the D1, which takes 15–30 seconds the first time.

The uncomfortable truth about hosted AI memory

Most personal AI memory needs are simple enough that a managed service is overkill. You don't need real-time vector recall across a million memories. You don't need a multi-tenant permissions model. You don't need an admin dashboard with usage analytics.

You need: a place to put 200–2000 facts about your work, projects, and preferences, and a way for any AI client to read and write to it. That's a single SQLite table on the edge.

Hosted services exist because setting up your own used to be hard. With MCP and Cloudflare's free tier, it isn't anymore.

When you should pay for hosted memory anyway

To be fair, there are real cases where Mem0 or Letta is correct:

  • Multi-user team memory — permissions, audit logs, SSO matter when you're sharing context across a company
  • Hybrid retrieval at scale — if you genuinely have millions of memories with vector + keyword + graph search, hosted products have spent real engineering on this
  • You don't want to touch infra ever — fair, but then you're paying $240+/year for that preference

For a single person who wants AI clients to stop forgetting them, none of those apply. Self-host. It's free. It's yours. It deploys in 4 minutes.

Frequently asked questions

Is Cloudflare's free tier really enough for AI memory?
For personal use, yes — comfortably. D1 free tier covers 5 million reads/day, 100k writes/day, and 5GB storage. A heavy single-user memory workload (saving every project decision + conversation context across 5 AI clients) lands around 2k reads + 200 writes per day. You're using ~0.04% of the free quota.
What does this actually cost if I exceed the free tier?
D1 paid pricing is $5/month for 25 billion reads, 50 million writes, and 5GB storage — and only kicks in after the free tier. Workers paid is $5/month and includes 10 million invocations. So worst case for a heavy individual user: $10/month, and you'd need to be making millions of memory operations to get there. Most people stay at $0 indefinitely.
How does this compare to Mem0 or Letta?
Mem0 starts at $20/month, Letta varies. Both are managed services — they own your data, you rent access. Context Hub is structurally simpler: it's just an MCP server reading from a D1 table you own. No vendor lock. No row-export migration project if you ever leave. The tradeoff is you're managing the deployment yourself, but the deployment is one npx command.
Will this break if Cloudflare changes their free tier?
The schema is portable SQLite. If Cloudflare ever changes terms, you export the D1 with a single wrangler command and re-host on Turso, libSQL, or local SQLite. The MCP protocol layer doesn't care what database is underneath. That's the point of building on open primitives.

Ship in one command

Try Context Hub yourself.

One command. Every AI tool you use, finally on the same page.

>_npx create-context-hub