April 15, 2025 · gashel01

I Built a 10-Library AI Agent Suite on MCP — Here’s What I Learned

If you’ve built AI agents, you know the pain: you start with a simple LLM call, then you need memory, then file access, then code execution, then scheduling. Before you know it, you’re maintaining a Frankenstein stack of loosely coupled services held together by string formatting and prayer.

I spent the last few months building an alternative. Ten Python libraries, 120+ tools, one orchestration kernel — all speaking MCP (Model Context Protocol). This is what I learned.

What MCP Changes

MCP is Anthropic’s open standard for connecting AI models to tools and data sources. Think of it as USB-C for AI: one protocol, any tool, any model. Before MCP, every framework invented its own tool-calling abstraction. LangChain tools don’t work in CrewAI. AutoGen connectors don’t plug into Semantic Kernel. You pick a framework and you’re locked in.

MCP eliminates that. My libraries expose standard MCP servers. You can wire them into Claude Desktop, VS Code, or any MCP-compatible client — no adapter code needed.

The Architecture

One orchestrator (kernelmcp) plus nine focused libraries — ten in total, each responsible for one concern:

                         +-----------------+
                         |    kernelmcp    |  Orchestrator
                         |  ReAct · LTP ·  |  + execution engine
                         |     Hybrid      |  (embeds ltpmcp)
                         +--------+--------+
                                  |  routes every tool call through one
                                  |  execute_tool() pipeline (budget,
                                  |  audit, circuit breaker, observability)
   +--------+--------+--------+---+----+--------+--------+--------+
   |        |        |        |        |        |        |        |
 memory  planning  work-   sandbox   rag    websearch sched-   evalmcp
  mcp      mcp     spacemcp   mcp     mcp      mcp     ulermcp  (eval lib)

Plus dynamic MCP-client and LangChain bridges, so the same pipeline also reaches 2000+ community MCP servers, 500+ LangChain tools, and 80+ LangChain RAG loaders.

  • kernelmcp — The brain. ReAct / LTP / Hybrid execution engine, LLM routing (cloud/local/fast model fallback), budget enforcement, circuit breakers, audit logging.
  • ltpmcp — The Lean Task Protocol: a library that compiles a goal into a deterministic plan in one LLM call, then executes it step-by-step (variables, conditionals, FOREACH, parallel groups, ON_FAIL, RE-PLAN). Primarily a library the kernel embeds — it also ships a CLI and a small MCP server (parse/validate/visualize plans).
  • memorymcp — Episodic memory, working memory, semantic fact graph, contradiction detection, GDPR-compliant forget_user. Multiple storage backends (in-memory, SQLite, on-disk).
  • planningmcp — Goal decomposition into structured plans with step-by-step execution, cost estimation, auto-replan on failure, and built-in templates (deploy, migrate, audit…).
  • workspacemcp — Sandboxed file operations with tenant isolation, checkpoint/restore, semantic tree view, audit trail. DLP patterns block secrets from leaving.
  • sandboxmcp — Code execution (Python, Node, shell) with resource limits, network egress control, AST-based code validation, Docker hardening, and host access with approval gates.
  • websearchmcp — Web search (SearXNG, DuckDuckGo, Mojeek, Brave) with Playwright browser rendering, clean markdown extraction, and CAPTCHA detection.
  • ragmcp — Document ingestion (15 native loaders + 80+ via the LangChain bridge), vector search, Self-RAG, ReAct-RAG, evaluation, semantic routing, user profiles.
  • schedulermcp — Cron, interval, one-shot, and watch-based scheduling with a dead-letter queue and exactly-once semantics.
  • evalmcp — A benchmark/eval library + CLI (also ships an MCP server): suites, metrics, regression/CI, and model comparison.

Total: 3,400+ tests across the suite, all runnable offline (in-memory backends by default).

Key Design Decisions

Why 10 Libraries, Not 1

Monoliths are tempting. One pip install, one import, done. But agents don’t all need the same capabilities. A code review bot needs workspace + sandbox but not scheduling. A research assistant needs RAG + memory but not file access. Separate packages mean you install only what you need, and each library can evolve independently.

The kernel ties them together. KernelFactory.full_suite() wires everything in one call. KernelFactory.create(memory_pipeline=..., sandbox_pipeline=...) lets you pick exactly the pieces you want.

Why ReAct + LTP Hybrid

ReAct (Reason + Act) is great for open-ended tasks: the LLM decides what tool to call at each step, observes the result, and reasons about what to do next. But it’s expensive. A 10-step task means 10+ LLM calls just for routing.

LTP (Lean Task Protocol) compiles a goal into a structured plan in a single LLM call, then executes the steps mechanically. On our reproducible suite it cuts per-task tokens by roughly 65–73% on structured tasks (full numbers on the benchmarks page). But it’s brittle when a step needs world knowledge it can’t derive, or fiddly stateful manipulation.

So in production we run the kernel in Hybrid mode (mode="hybrid" — ReAct is the bare default) — and it’s result-driven, not a guess: clearly exploratory goals go straight to ReAct; everything else runs LTP first, then the engine verifies the result and falls back to ReAct only if it’s inadequate. That fallback is why Hybrid stays the most reliable mode (100% across model tiers in our tests) while keeping most of LTP’s efficiency. A TaskSupervisor can also route between cloud, local, and fast models by complexity.

Why AGPL

Honest answer: I want the code to stay open. If a company uses this to build a product, they should contribute back. If you’re building internal tools or doing research, AGPL doesn’t restrict you at all. Commercial licenses are available for those who need them.

The Hard Parts

Token efficiency was the biggest engineering challenge. A naive ReAct loop can burn through tens of thousands of tokens on a simple task because it dumps the full tool registry (120+ tool definitions) into every prompt. The kernel handles this two ways: a search_tools meta-tool loads only the tools the LLM actually asks for instead of all of them up front, and — once I’d fixed a caching bug where the breakpoint was set at the wrong level and silently did nothing — Anthropic prompt caching makes the large static prefix cheap to re-send. (That bug only surfaced because I benchmarked cost honestly; more on that on the benchmarks page.)

Making 7 libraries work together without tight coupling required discipline. Each library exposes a clean async Python API. The SuiteOrchestrator maps tool names to libraries through a routing table — no library imports another library directly. If memorymcp isn’t installed, the kernel just skips it. No crashes, no conditional imports in hot paths.

Security was non-negotiable. The sandbox runs code with resource limits (RAM, CPU, timeout). Network egress requires explicit domain approval. Host access (running commands on the actual machine) goes through a HostGuard with per-namespace approval gates. The workspace has DLP patterns that block API keys and secrets from being written to files.

Testing at scale meant 3,500+ tests that all need to run without external services. Every library defaults to in-memory backends so tests are fast and isolated. pytest runs the full suite in under 10 seconds.

Try It

pip install mcpaisuite-kernelmcp mcpaisuite-memorymcp mcpaisuite-planningmcp mcpaisuite-workspacemcp mcpaisuite-sandboxmcp mcpaisuite-schedulermcp mcpaisuite-ragmcp

Five lines to get a working agent:

from kernelmcp import KernelFactory

kernel = KernelFactory.full_suite(llm_model="claude-sonnet-4-6")
task = await kernel.run("Analyze the Python files in the workspace and write a summary")
print(task.summary)

Or wire it into Claude Desktop via claude_desktop_config.json:

{
  "mcpServers": {
    "kernelmcp": {
      "command": "kernelmcp",
      "args": ["start"]
    }
  }
}

What’s Next

The immediate roadmap:

  • PyPI publication — all 7 packages, with proper dependency pinning
  • More vector backends — Qdrant and Milvus support for RAG at scale
  • Streaming responses — SSE streaming from the ReAct loop for real-time UIs
  • Community templates — share planning templates across projects

If you’re building AI agents and tired of gluing frameworks together, give it a try. Issues and PRs welcome.