memorymcp is a modular Python library for persistent cognitive memory. Store facts, episodes, and context across sessions. Query with semantic search. Expose as an MCP server.
from memorymcp import MemoryFactory
pipeline = MemoryFactory.default()
await pipeline.store_fact(
"User prefers Python over JavaScript",
fact_type="preference",
importance=0.9,
)
facts = await pipeline.query_memory("coding preferences", top_k=5)
context = "\n".join(f.fact.content for f in facts)
from memorymcp.mcp_server import MemoryMCPServer
MemoryMCPServer(pipeline).run()
Modular by design — each memory tier and component is replaceable.
HotCache (LRU RAM) -> Working -> Episodic -> Semantic. Each tier has the right TTL and storage backend.
Powered by ragmcp. Store facts as vector embeddings. Query by meaning, not exact match.
Two optional retrieval upgrades, off by default. A cross-encoder reranker (fastembed, no torch) re-scores retrieved facts — measured recall@1 0.40 → 0.88. Adaptive query expansion rewrites weak queries and re-searches — measured recall@5 0.42 → 0.92.
Most memory systems call an LLM to extract facts from text — a paid API call every time you write. memorymcp's default extractor is deterministic regex (no LLM) and its default embedder runs locally (fastembed, on your CPU), so ingestion costs $0. Need to parse messy prose? Switch on the optional LLM extractor — then you pay per write, like the others.
16 MCP tools out of the box: query_memory, store_fact, get_session_context, add_episode, assemble_context, and more.
ADD/UPDATE/OVERWRITE/SKIP logic. Contradictions are detected and archived. Full timeline preserved.
Tri-modal decay (linear/exponential/anchored). Usage-based slowdown. Separate confidence tracking.
PII filter (email, phone, CC, SSN). GDPR Art.17 forget_user(). Per-namespace memory quotas.
MemoryFactory.default() works without any external service. Add Qdrant, Redis, Neo4j — all optional.
Redis for working memory, SQLite for local persistence, Neo4j for the fact graph, and pgvector or Chroma for semantic search.
No boilerplate. memorymcp handles the memory plumbing so your agent can focus on reasoning.
One pip install. Zero dependencies to start. Backends are optional extras.
Store facts with type, importance, and confidence. Consolidation engine handles duplicates automatically.
Retrieve relevant context by meaning. memorymcp searches all tiers and ranks by decay-adjusted score.
Expose as an MCP server, embed in your agent, or use as a standalone service.
import asyncio
from memorymcp import MemoryFactory
from pydantic_ai import Agent
async def main():
pipeline = MemoryFactory.default()
agent = Agent("openai:gpt-4o")
@agent.tool
async def remember(content: str, fact_type: str = "general") -> str:
await pipeline.store_fact(content, fact_type=fact_type, importance=0.8)
return f"Stored: {content}"
@agent.tool
async def recall(query: str, top_k: int = 5) -> str:
facts = await pipeline.query_memory(query, top_k=top_k)
return "\n".join(f"[{f.fact.fact_type}] {f.fact.content}" for f in facts)
ctx = await pipeline.assemble_context("coding preferences", session_id="sess_1", max_tokens=1000)
result = await agent.run("What do you know about my coding preferences?", system_prompt=ctx.render("markdown"))
print(result.data)
asyncio.run(main())
from memorymcp import MemoryFactory
from memorymcp.mcp_server import MemoryMCPServer
pipeline = MemoryFactory.default()
MemoryMCPServer(pipeline).run()
{
"mcpServers": {
"memorymcp": {
"command": "memorymcp",
"args": ["serve"],
"cwd": "/path/to/your/project"
}
}
}
{
"mcpServers": {
"memorymcp": {
"type": "sse",
"url": "http://localhost:8080/sse"
}
}
}
| Tier | Name | Backend | TTL | Purpose |
|---|---|---|---|---|
| 0 | HotCache | LRU RAM | seconds | Ultra-fast access to recently used facts |
| 1 | Working | InMemory | minutes | Current task context |
| 2 | Episodic | InMemory / DB | hours | Session history and episode summaries |
| 3 | Semantic | ragmcp (Qdrant / pgvector) | permanent | Long-term knowledge — queried by meaning |
memorymcp vs Mem0Same model, same controls, losses shown as plainly as wins. Every number reproduces from a script with raw JSON in the repo.
| Metric | memorymcp | Mem0 |
|---|---|---|
| recall@1 | 0.775 | 0.500 |
| recall@5 | 0.900 | 0.925 |
| MRR@10 | 0.826 | 0.679 |
| extraction cost / 8 facts | $0 | $0.071 |
Recall is close and split — memorymcp ranks the target higher (recall@1, MRR), Mem0 catches a couple more in the tail (recall@5). The clear, defensible win is cost: deterministic extraction is free where Mem0's LLM extraction is ~$0.009/fact. (With the optional cross-encoder reranker on, memorymcp leads every recall metric.)
One pip install. No external services required to start.