Python memory library · MCP-native · AGPL-3.0

Your agent's memory.
Persistent. Intelligent.
MCP-native.

memorymcp is a modular Python library for persistent cognitive memory. Store facts, episodes, and context across sessions. Query with semantic search. Expose as an MCP server.

agent_memory.py
from memorymcp import MemoryFactory

pipeline = MemoryFactory.default()

await pipeline.store_fact(
    "User prefers Python over JavaScript",
    fact_type="preference",
    importance=0.9,
)

facts = await pipeline.query_memory("coding preferences", top_k=5)
context = "\n".join(f.fact.content for f in facts)

from memorymcp.mcp_server import MemoryMCPServer
MemoryMCPServer(pipeline).run()
4-tier memory Semantic search via ragmcp
MCP stdio / SSE
4-tier memory
ragmcp-powered
MCP-native
16 MCP tools
AGPL-3.0
Python 3.11+

Everything your agent needs
to remember and learn

Modular by design — each memory tier and component is replaceable.

4-Tier Memory

HotCache (LRU RAM) -> Working -> Episodic -> Semantic. Each tier has the right TTL and storage backend.

HotCacheWorkingEpisodic

Semantic Memory

Powered by ragmcp. Store facts as vector embeddings. Query by meaning, not exact match.

ragmcpQdrantpgvector

Reranker & query expansion

Two optional retrieval upgrades, off by default. A cross-encoder reranker (fastembed, no torch) re-scores retrieved facts — measured recall@1 0.40 → 0.88. Adaptive query expansion rewrites weak queries and re-searches — measured recall@5 0.42 → 0.92.

enable_rerankenable_query_expansion

$0 ingestion by default

Most memory systems call an LLM to extract facts from text — a paid API call every time you write. memorymcp's default extractor is deterministic regex (no LLM) and its default embedder runs locally (fastembed, on your CPU), so ingestion costs $0. Need to parse messy prose? Switch on the optional LLM extractor — then you pay per write, like the others.

No LLMLocal embeddingsLLM extractor optional

MCP-Native

16 MCP tools out of the box: query_memory, store_fact, get_session_context, add_episode, assemble_context, and more.

stdioSSE

Consolidation Engine

ADD/UPDATE/OVERWRITE/SKIP logic. Contradictions are detected and archived. Full timeline preserved.

ADDUPDATEOVERWRITE

Decay & Confidence

Tri-modal decay (linear/exponential/anchored). Usage-based slowdown. Separate confidence tracking.

linearexponentialadaptive

Privacy & Quotas

PII filter (email, phone, CC, SSN). GDPR Art.17 forget_user(). Per-namespace memory quotas.

GDPR Art.17PII filter

Zero Config Start

MemoryFactory.default() works without any external service. Add Qdrant, Redis, Neo4j — all optional.

default()create()from_env()

Persistent Backends

Redis for working memory, SQLite for local persistence, Neo4j for the fact graph, and pgvector or Chroma for semantic search.

RedisSQLiteNeo4j

Install once.
Remember what matters.

No boilerplate. memorymcp handles the memory plumbing so your agent can focus on reasoning.

1

Install

One pip install. Zero dependencies to start. Backends are optional extras.

pip install mcpaisuite-memorymcp
2

Store

Store facts with type, importance, and confidence. Consolidation engine handles duplicates automatically.

await pipeline.store_fact( "User prefers Python", fact_type="preference" )
3

Query

Retrieve relevant context by meaning. memorymcp searches all tiers and ranks by decay-adjusted score.

facts = await pipeline.query_memory( "coding preferences", top_k=5 )
4

Deploy

Expose as an MCP server, embed in your agent, or use as a standalone service.

MemoryMCPServer(pipeline).run()

See it in action

Python — pydantic-ai agent with memorymcp
import asyncio
from memorymcp import MemoryFactory
from pydantic_ai import Agent

async def main():
    pipeline = MemoryFactory.default()
    agent = Agent("openai:gpt-4o")

    @agent.tool
    async def remember(content: str, fact_type: str = "general") -> str:
        await pipeline.store_fact(content, fact_type=fact_type, importance=0.8)
        return f"Stored: {content}"

    @agent.tool
    async def recall(query: str, top_k: int = 5) -> str:
        facts = await pipeline.query_memory(query, top_k=top_k)
        return "\n".join(f"[{f.fact.fact_type}] {f.fact.content}" for f in facts)

    ctx = await pipeline.assemble_context("coding preferences", session_id="sess_1", max_tokens=1000)
    result = await agent.run("What do you know about my coding preferences?", system_prompt=ctx.render("markdown"))
    print(result.data)

asyncio.run(main())
Python — MCP server
from memorymcp import MemoryFactory
from memorymcp.mcp_server import MemoryMCPServer

pipeline = MemoryFactory.default()
MemoryMCPServer(pipeline).run()
16 MCP tools: query_memory, store_fact, get_session_context, add_episode, assemble_context, set_working_memory, get_working_memory, forget_fact, memory_stats, memory_config, traverse_graph, find_contradictions, extract_facts, export_memory, import_memory, memory_analytics

stdio (recommended)

JSON
{
  "mcpServers": {
    "memorymcp": {
      "command": "memorymcp",
      "args": ["serve"],
      "cwd": "/path/to/your/project"
    }
  }
}

SSE (shared server)

JSON
{
  "mcpServers": {
    "memorymcp": {
      "type": "sse",
      "url": "http://localhost:8080/sse"
    }
  }
}

A dedicated store for
each tier of memory

TierNameBackendTTLPurpose
0HotCacheLRU RAMsecondsUltra-fast access to recently used facts
1WorkingInMemoryminutesCurrent task context
2EpisodicInMemory / DBhoursSession history and episode summaries
3Semanticragmcp (Qdrant / pgvector)permanentLong-term knowledge — queried by meaning

memorymcp vs Mem0

Same model, same controls, losses shown as plainly as wins. Every number reproduces from a script with raw JSON in the repo.

MetricmemorymcpMem0
recall@10.7750.500
recall@50.9000.925
MRR@100.8260.679
extraction cost / 8 facts$0$0.071

Recall is close and split — memorymcp ranks the target higher (recall@1, MRR), Mem0 catches a couple more in the tail (recall@5). The clear, defensible win is cost: deterministic extraction is free where Mem0's LLM extraction is ~$0.009/fact. (With the optional cross-encoder reranker on, memorymcp leads every recall metric.)

See the full benchmark →

Ready to give your agent
persistent memory?

One pip install. No external services required to start.

Read the docs Star on GitHub