kernelmcp
⚙ 32 tools
kernelmcp
The sovereign orchestrator — connects all MCP AI suite libraries into an autonomous agent.
The brain of the MCP AI suite: ragmcp (knowledge) · memorymcp (memory) · planningmcp (reasoning) · workspacemcp (files) · sandboxmcp (execution) · schedulermcp (scheduling) · websearchmcp (search) · ltpmcp (protocols) · evalmcp (evaluation) · kernelmcp (orchestrator)
120+ native tools — and connects at runtime to any MCP server or LangChain tool (2000+ community MCP servers and 500+ LangChain tools are reachable, none bundled). 7 agent types. Multi-agent TaskForce with 5 patterns. ReAct + LTP hybrid engine. Dynamic MCP client. LangChain bridge. Self-hosted Hub connector (monitor + control your own kernels). Zero manual intervention.
Philosophy: “The orchestrator feels nothing. It reads the event, enforces the plan, delegates the execution, and reports the result.”
What is kernelmcp?
kernelmcp is the central nervous system that wires the entire MCP AI suite into a multi-agent orchestrator. It receives a goal, creates a plan, delegates work to specialized servers and agents, tracks costs, heals failures, and returns results — all through a ReAct + LTP (Lean Task Protocol) hybrid engine driven by an LLM.
Without kernelmcp, each library operates independently. With it, they become a coordinated multi-agent system:
Integration Matrix
| Capability | Library | How kernelmcp uses it |
|---|---|---|
| Knowledge retrieval | ragmcp | Searches documents before the LLM guesses. Self-RAG for verified answers. ReAct RAG for multi-step reasoning. RAGAS eval for quality measurement. |
| Persistent memory | memorymcp | Full context assembly (persona + working memory + episodes + facts) at task start. Auto-stores episodes and outcomes. Consolidation + decay engine. GDPR forget_user. |
| Plan decomposition | planningmcp | Creates and enforces step-by-step plans with templates (deploy, migrate, audit…). Cost estimation, validation, replan on failure. |
| File operations | workspacemcp | Reads, writes, checkpoints files with DLP secret detection and approval gates for sensitive files. |
| Code execution | sandboxmcp | Runs code in isolated sandboxes with auto-heal. Vault for secrets, artifact signing, code validation. |
| Task scheduling | schedulermcp | Cron, interval, and watch jobs for event-driven automation |
| Host access | kernelmcp | host_exec, host file tools with per-tenant HostGuard whitelist |
| Web search | websearchmcp | SearXNG self-hosted search + Playwright browser_fetch |
Governance & Security
Governance in kernelmcp is enforced at the execution layer — a code path that stops the agent regardless of what the LLM “decides” — not asked for in a prompt. Every tool call funnels through a single chokepoint where the rules are applied.
- DLP secret guard — every tool result is scanned and secrets (AWS keys, tokens, private keys, connection strings) are redacted before they reach the model, so it can’t leak what it never sees. Outbound tool calls whose arguments carry a secret are blocked at dispatch. Deterministic pattern matching, not LLM judgement. Opt-in:
enable_dlp=True/KERNELMCP_DLP; emitssecret.redacted/secret.blockedevents. - Provable plans — when running in LTP/hybrid mode, kernelmcp can statically verify the compiled plan against a policy before it executes a single step. Pass a
ltpmcp.PlanPolicyasplan_policy(opt-in; default off): the kernel runsverify_planon the plan and refuses it if it violates the policy — denied tools, a step budget, runtime escape hatches, a no-egress-after-sensitive-read data-flow rule, or secret/path arguments. An exfiltration plan is refused at 0 steps, before any tool runs. See ltpmcp → Provable plans. - Budget cap — hard token/cost ceiling per task; the loop stops the moment it’s exceeded.
- Loop detection — identical repeated tool calls are blocked (
TOOL_DISPATCH_BLOCKED). - Approval gates & checkpoints (via workspacemcp) — destructive file ops on sensitive paths require approval and are snapshotted first.
- Egress allowlist, RBAC, and a full audit trail — every block carries a reason and a correlation id, and is replayable.
Same model, same prompt — the only difference is whether the rules are enforced or merely prompted.
Multi-Agent TaskForce
The headline feature: compose multiple specialized agents into coordinated teams that collaborate on complex goals.
Creating a TaskForce
from kernelmcp import KernelFactory
from kernelmcp.agents.taskforce import TaskForce
kernel = KernelFactory.from_env()
registry = kernel._agent_registry
# Pre-built template (coding / research / writing / analysis)
tf = TaskForce.create("coding", goal="Build a REST API with auth", registry=registry)
# Or build a taskforce from explicit agents
from kernelmcp.agents.patterns import AgentConfig
tf = TaskForce(
agents=[
AgentConfig(type="research", role="Researcher"),
AgentConfig(type="code", role="Implementer"),
AgentConfig(type="file", role="Documenter"),
],
goal="Research and implement caching strategy",
pattern="sequential",
registry=registry,
)
result = await tf.run()
5 Execution Patterns
| Pattern | Description | Use case |
|---|---|---|
sequential | Agents execute one after another, each building on the previous result | Linear workflows, pipelines |
parallel | All agents execute concurrently, results are merged | Independent subtasks, speed |
supervisor | A supervisor agent delegates to workers and synthesizes results | Complex multi-step goals |
debate | Agents argue opposing positions, a judge selects the best answer | Decision-making, verification |
swarm | Agents self-organize dynamically based on the task state | Emergent collaboration |
Inter-Agent Infrastructure
| Component | Description |
|---|---|
| SharedMemory | Shared key-value store accessible by all agents in the taskforce |
| MessageBus | Agent-to-agent messaging for coordination and status updates |
| SandboxScope | Per-agent isolation — each agent gets its own sandbox, files, and context |
| Handoff | Structured context transfer when one agent hands off to another |
Pre-Built TaskForce Templates
| Template | Agents | Pattern | Purpose |
|---|---|---|---|
coding | code, code, file | sequential | Build software with tests and docs |
research | research, research, memory | sequential | Research a topic with verification and summary |
writing | research, custom, custom | sequential | Draft and edit written content from research |
analysis | research, code, custom | sequential | Gather data, analyze it, and write a report |
7 Agent Types
kernelmcp provides 7 specialized agent types, each with a focused tool set and constitution:
| Agent Type | Role | Key capabilities |
|---|---|---|
code | Software engineer | Write, edit, execute, debug code via sandboxmcp |
research | Researcher | Search the web (via websearchmcp), browse, RAG queries, synthesize |
file | File manager | Read, write, checkpoint, organize files via workspacemcp |
memory | Memory curator | Store, retrieve, consolidate, forget via memorymcp |
plan | Planner | Decompose goals, create plans, estimate costs via planningmcp |
custom | User-defined | Any combination of tools and constitution rules |
meta | Meta-agent | Analyze runs, improve constitutions, generate templates |
from kernelmcp import KernelFactory
# Spawn a specific agent type
kernel = KernelFactory.from_env()
result = await kernel.spawn_agent("research", goal="Find best practices for API rate limiting")
MetaAgent (Self-Improving)
The MetaAgent analyzes past runs and proposes improvements to the system itself:
- Failure analysis — examines failed or slow runs to identify root causes
- Constitution improvements — proposes rule changes based on observed failure patterns
- LTP template generation — auto-generates reusable LTP templates from successful runs
- Dry-run mode — preview proposed changes before applying them
# CLI: analyze recent runs and suggest improvements
kernelmcp improve --dry-run
# Apply improvements
kernelmcp improve
from kernelmcp.agents.meta_agent import MetaAgent
meta = MetaAgent(
llm=kernel._engine._llm,
orchestrator=kernel.orchestrator,
audit_logger=kernel._audit,
)
report = await meta.analyze(namespace="default", limit=200)
suggestions = await meta.suggest(report)
for s in suggestions:
print(f"[{s.type}] {s.content} (confidence={s.confidence:.2f})")
Observability
Full visibility into agent execution with tracing, analytics, and replay.
Tracer
Span-based tracing for every operation:
from kernelmcp.observability import Tracer
tracer = Tracer()
span = tracer.start_span("my_task", attributes={"model": "claude-sonnet-4-6"})
try:
result = await kernel.run("analyze code")
tracer.end_span(span, status="ok")
except Exception as exc:
tracer.end_span(span, status="error", error=str(exc))
raise
# Retrieve the trace tree for a task later
spans = tracer.get_trace(task_id="abc123")
Analytics
Built-in analytics for token efficiency, tool performance, agent performance, cost breakdown, and bottleneck detection:
from kernelmcp.observability import Analytics
# Analytics exposes static methods over recorded spans
spans = tracer.get_all_spans()
efficiency = Analytics.token_efficiency(spans) # total/avg tokens, wasted tokens
tools = Analytics.tool_performance(spans) # per-tool call count, success rate, latency
agents = Analytics.agent_performance(spans) # per-agent success rate, avg turns/tokens
costs = Analytics.cost_breakdown(spans) # cost by model, agent, tool
slow = Analytics.bottlenecks(spans, threshold_ms=5000)
# Or get everything at once from the tracer
report = tracer.get_analytics()
ReplayEngine
Replay and debug past runs:
from kernelmcp.observability import ReplayEngine
replay = ReplayEngine(audit_logger=kernel._audit, tracer=tracer)
# Register a completed task, then inspect it
replay.register_task(task)
timeline = replay.get_timeline(task_id="abc123")
state = replay.get_state_at(task_id="abc123", turn_index=5)
forked = replay.fork(task_id="abc123", from_turn=3)
diff = replay.compare(task_id_a="abc", task_id_b="def")
OTel Bridge (Optional)
Export traces to any OpenTelemetry-compatible backend (Jaeger, Zipkin, Grafana Tempo):
kernel = KernelFactory.create(otel_endpoint="http://localhost:4317")
Observability MCP Tools
| Tool | Description |
|---|---|
get_trace | Retrieve the full trace for a task |
get_analytics | Get analytics summary for a namespace |
compare_runs | Compare two task runs side by side |
Self-Hosted Hub (monitor & control your kernels)
Embed kernelmcp in your own apps and point each one at a self-hosted Hub to monitor your kernels from one place — and optionally control them. Monitoring is telemetry push over an outbound-only connection (no inbound port on your app); control is opt-in.
from kernelmcp import KernelFactory, connect_hub
kernel = KernelFactory.from_env()
# Monitoring only (always on once connected):
await connect_hub(kernel, hub_url="http://my-hub:8007", project="prod", api_key="kmh_...")
# ...or also let the Hub send commands to this kernel (opt-in):
await connect_hub(kernel, hub_url="http://my-hub:8007", project="prod",
api_key="kmh_...", allow_control=True)
# Use the kernel normally — finished tasks show up in your Hub.
connect_hub(...) is fail-safe and a no-op when unconfigured (it also reads
KERNELMCP_HUB_URL / KERNELMCP_HUB_KEY / KERNELMCP_HUB_PROJECT from the
environment), so it is always safe to call unconditionally. With allow_control=True
the Hub can send ping / stats / set_config / run / cancel commands; pass
run_handler(goal) to customize how run executes. Unlike a telemetry collector,
this is a control plane your own embedded kernels connect to — you self-host it
and keep your data.
Persistence
SQLite-based checkpointing for task state recovery.
StateManager
from kernelmcp.persistence.state_manager import StateManager
from kernelmcp.persistence.checkpoint import SQLiteCheckpointStore
# KernelFactory.create() already wires a StateManager as kernel._state_manager.
# To build one manually:
store = SQLiteCheckpointStore(db_path="kernel_state.db")
state_mgr = StateManager(store=store, kernel_pipeline=kernel)
# Manual checkpoint -- returns a checkpoint id
checkpoint_id = await state_mgr.checkpoint(label="before migration")
# Recover from latest checkpoint (or pass a specific checkpoint_id)
restored = await state_mgr.restore()
# List recent checkpoints
checkpoints = await state_mgr.list_checkpoints(limit=20)
Persistence MCP Tools
| Tool | Description |
|---|---|
checkpoint | Create a checkpoint for a running task |
restore | Restore a task from a checkpoint |
list_checkpoints | List available checkpoints |
A2A Protocol
Agent-to-Agent (A2A) protocol support for cross-agent interoperability.
Components
| Component | Description |
|---|---|
A2AServer | Exposes any kernel agent as an A2A endpoint (/.well-known/agent.json + /a2a/tasks) |
A2AClient | Discover and call external A2A agents by their well-known URL |
A2ABridge | Auto-creates MCP tools from discovered A2A agent skills |
Usage
from kernelmcp.a2a import A2AServer, A2AClient
from kernelmcp.a2a.bridge import A2ABridge # not re-exported from kernelmcp.a2a
# Expose this agent as an A2A endpoint
server = A2AServer(kernel, skills=["code_review", "testing"])
await server.start(port=8080)
# Now discoverable at http://localhost:8080/.well-known/agent.json
# Call an external A2A agent
client = A2AClient("https://other-agent.example.com")
card = await client.discover() # fetch agent card
result = await client.send_task("review this PR", skill="code_review")
# Bridge: auto-register external A2A skills as local MCP tools
bridge = A2ABridge(client, kernel.orchestrator)
tool_names = await bridge.register_agent("https://other-agent.example.com")
# Now callable as: kernel.call_tool("a2a_code_review", {...})
Multi-Modal
Vision and audio analysis capabilities exposed as MCP tools.
Inline vision (the agent sees images in its loop)
Beyond the analysis tools below, the ReAct loop is inline-multimodal: when a tool returns an image (a screenshot of a page, or read_file on a .png/.jpg), the pixels are injected into the conversation as real image blocks so the same reasoning model sees them on the next turn — not a separate text description. This is gated on the configured model supporting vision. It lets an agent verify a UI it built (screenshot("http://localhost:3000")) or inspect a frame it generated, then iterate: render → screenshot/read_file → look → fix.
Components
| Component | Description |
|---|---|
VisionAnalyzer | Image/screenshot analysis: analyze, analyze_screenshot, analyze_file, compare |
AudioTranscriber | Audio processing: transcribe, summarize |
MCP Tools
| Tool | Description |
|---|---|
analyze_image | Analyze an image from URL or base64 with a prompt |
analyze_screenshot | Capture and analyze a screenshot of a URL |
screenshot | Capture a URL (or local app) and return the image inline so the agent sees it directly |
Usage
from kernelmcp.multimodal import VisionAnalyzer, AudioTranscriber
# Vision
vision = VisionAnalyzer(kernel)
result = await vision.analyze_file("screenshot.png", question="Describe the UI layout")
diff = await vision.compare(["before.png", "after.png"], question="What changed?")
analysis = await vision.analyze_file("diagram.pdf", question="Extract the architecture")
# Audio
audio = AudioTranscriber(kernel)
transcript = await audio.transcribe("meeting.mp3")
summary = await audio.summarize("meeting.mp3", prompt="Key decisions and action items")
Enterprise
Role-based access control, cost allocation, and SLA monitoring for production deployments.
Components
| Component | Description |
|---|---|
RBACManager | 4 default roles (admin, operator, viewer, agent) with namespace + tool permissions |
CostAllocator | Per-namespace, per-agent, and per-tool cost tracking with budget alerts |
SLAMonitor | SLA rules, violation alerts, and webhook notifications |
Usage
from kernelmcp.enterprise.rbac import RBACManager, Permission
from kernelmcp.enterprise.cost_allocation import CostAllocator
from kernelmcp.enterprise.sla import SLAMonitor, SLARule
# RBAC -- define roles, assign users, then check permissions
rbac = RBACManager() # or RBACManager.default_roles() for admin/operator/viewer/agent
rbac.add_role("operator", [
Permission(role="operator", namespace="prod", tools=["run_task", "get_task_status"]),
])
rbac.assign_user("alice", "operator")
rbac.check("alice", tool="host_exec", namespace="prod") # -> False (denied)
# Cost allocation -- track spend per namespace/agent/tool
costs = CostAllocator()
await costs.set_budget(namespace="prod", max_cost=500.0)
await costs.record(namespace="prod", cost=0.42, agent_type="code", tool="execute_code")
report = await costs.get_report(namespace="prod") # spent, budget, within_budget
# SLA monitoring -- add rules, then check against live metrics
sla = SLAMonitor() # or SLAMonitor.default_sla() for sensible presets
sla.add_rule(SLARule(name="P95 Latency", metric="p95_latency_ms", threshold=5000, operator="lte"))
sla.add_rule(SLARule(name="Error Rate", metric="error_rate", threshold=0.01, operator="lte"))
sla.set_webhook("https://hooks.example.com/alerts")
alerts = await sla.check({"p95_latency_ms": 6200, "error_rate": 0.03})
Scaling
BudgetScaler
Control cost and concurrency at the system level:
from kernelmcp.scaling import BudgetScaler
scaler = BudgetScaler(
max_cost_per_hour=5.0, # USD ceiling per hour
max_tokens_per_hour=500000, # Token ceiling per hour
max_concurrent_agents=10, # Agent concurrency limit
)
kernel = KernelFactory.create(scaler=scaler)
Docker Compose
Production-ready stack with all services:
docker compose -f deploy/docker-compose.yml up
Kubernetes
Kubernetes manifests for cloud-native deployment:
kubectl apply -f deploy/k8s/
Streaming
Real-time task execution streaming via Server-Sent Events (SSE):
from kernelmcp.streaming import TaskStream
stream = TaskStream(kernel)
async for event in stream.run("build the API"):
print(f"[{event.type}] {event.data}")
# task.started, turn.completed, tool.called, task.completed, ...
SSE events are compatible with any SSE client (browser EventSource, curl, etc.).
ReAct Engine
The ReAct Engine is the heart of kernelmcp. It runs a recursive loop: ask the LLM, execute tool calls, feed results back, repeat until the task is complete or the budget is exhausted.
Loop features:
- Context compaction — when the conversation approaches the window size, older turns are summarized by the LLM into a single, fact-preserving summary turn (incremental, with a circuit breaker) instead of being silently truncated.
- In-loop todo list — for multi-step tasks the agent maintains a checklist via
write_todos(TodoWrite-style), re-injected each turn so it stays oriented; offered only for non-trivial tasks. - Inline multimodal — tool-returned images (screenshots, generated frames) are shown to the model directly (see Multi-Modal).
- Parallel read-only tools — side-effect-free tool calls issued together (read/grep/glob) run concurrently; stateful calls stay serial.
- Workspace awareness — a bounded top-level snapshot of the workspace is surfaced so the agent doesn’t guess whether a file exists.
- Build system prompt — Constitution + context
- Call LLM — with fallback
- If tool_calls — execute via orchestrator, auto-heal on failure, audit + cost tracking, then loop back to step 1
- If text response — task complete, store outcome in memory
LTP Engine (Lean Task Protocol)
The LTP engine compiles natural language plans into deterministic execution graphs. Unlike ReAct (which reasons at every step), LTP compiles once and executes deterministically — faster, cheaper, and predictable for structured tasks.
Hybrid mode (default)
Hybrid mode auto-selects between ReAct and LTP based on task structure:
# Hybrid mode (default) -- kernel decides
result = await kernel.run_task("deploy the API to staging and production")
# Force LTP mode
result = await kernel.run_task("deploy the API", mode="ltp")
# Force ReAct mode
result = await kernel.run_task("debug the auth bug", mode="react")
LTP directives
| Directive | Description | Example |
|---|---|---|
@PARALLEL | Execute steps concurrently | STEP 2a @PARALLEL: run tests |
ON_FAIL | Error handling per step | ON_FAIL: RE-PLAN |
FOREACH | Iterate over a collection | FOREACH env IN [staging, prod]: |
RE-PLAN | Dynamic replanning on failure | ON_FAIL: RE-PLAN |
| Dot notation | Access data from previous steps | {{step1.output.url}} |
| Type casting | Cast values between types | {{step2.count | int}} |
Compiled plan example
STEP 1: search_documents "API authentication patterns"
STEP 2a @PARALLEL: write_file api/auth.py
STEP 2b @PARALLEL: write_file tests/test_auth.py
STEP 3: execute_code "pytest tests/test_auth.py"
ON_FAIL: RE-PLAN
STEP 4: FOREACH env IN [staging, production]:
deploy --target {{env}} --artifact {{step2a.output.path}}
Agent-JIT cache (experimental, situational)
Many workloads repeat the same kind of task with different parameters (“sum of squares to 100”, then “…to 900”). Agent-JIT amortizes those repeats. It is experimental and off by default — a genuine win on the right workload, but not a universal speedup (measured details below).
- First sighting (cold). The task runs through the normal engine. Its winning solution (the
execute_codepattern) is cached, keyed by a semantic signature — the goal with numbers masked, embedded locally (fastembed) so paraphrases of the same family match. - Second sighting (shadow). The cached pattern is adapted to the new goal and run, and the cold engine runs too. Their outputs are compared deterministically (not by an LLM). A match marks the family trusted.
- Trusted (warm). Later instances skip cold reasoning entirely: adapt the validated pattern and execute it. Much cheaper, and safe — trust came from exact output comparison, not a model’s judgement.
- Correctness is safe; cost is not always. A warm answer is only used after deterministic output validation, and any mismatch falls back to the full engine — so it never ships an unvalidated answer. But it is not “never cheaper”: measured, a warm reuse is ~34× cheaper (~330 vs ~11k tokens) once engaged, yet engagement requires the family to reliably route through
execute_code(non-deterministic for simple tasks) and the cold runs to validate the shadow. On low-repetition or non-code-routed traffic it can be net-neutral to ~+15% (an un-amortized shadow pass). That’s why it’s experimental and off by default — turn it on for known repetitive, code-heavy workloads, not as a blanket optimization.
Off by default — opt in per kernel:
kernel = KernelFactory.from_env() # honours KERNELMCP_JIT=1
# or explicitly
kernel = KernelFactory.create(..., jit=True)
Inspect or clear the cache (optionally per namespace) from any surface:
kernel.jit_stats() # {"enabled": True, "families": 3, "trusted": 2, "total_hits": 41, ...}
kernel.jit_clear() # drop cached patterns; returns the count removed
kernelmcp run "sum of squares to 500" --jit
kernelmcp jit stats
kernelmcp jit clear
Also exposed as MCP tools (jit_stats, jit_clear), HTTP endpoints (GET /jit/stats, POST /jit/clear), and a toggle in the Hub’s Engine settings. Patterns persist to ~/.kernelmcp/jit_cache.json and are isolated per namespace (tenant-safe).
schedulermcp
Event-driven task scheduling with four schedule types:
| Type | Description | Example |
|---|---|---|
once | Run once at a specific time | run_at: "2026-04-28T09:00:00" |
cron | Cron expression schedule | cron_expr: "0 9 * * *" |
interval | Run every N seconds | interval_seconds: 300 |
watch | Trigger on condition change | watch_command + watch_condition |
Watch jobs
Watch jobs monitor a command’s output and trigger when conditions are met:
# Watch for errors in the log file
await kernel.call_tool("schedule_task", {
"name": "error_watcher",
"schedule_type": "watch",
"watch_command": "tail -1 /var/log/app.log",
"watch_condition": "contains:ERROR",
"task": "Analyze the error and suggest a fix"
})
Host System Access
Secure host commands and file operations via HostGuard whitelist:
| Tool | Description |
|---|---|
host_exec | Execute a whitelisted command on the host |
host_file_read | Read a file from the host filesystem |
host_file_write | Write content to a file on the host |
host_file_copy | Copy a file on the host |
host_file_list | List files in a host directory |
All operations are gated by HostGuard — commands and paths must be explicitly whitelisted. Unapproved operations are rejected.
Web Search & Browser Fetch (via websearchmcp)
The kernel delegates web search and browser fetch to websearchmcp, which provides:
- SearXNG — self-hosted meta search engine with multi-engine rotation (Google, DuckDuckGo, Brave, Bing). Privacy-respecting, no tracking.
- browser_fetch — Playwright-based headless browser for rendering JavaScript-heavy pages before extracting content.
# The kernel routes web tools to websearchmcp automatically
content = await kernel.call_tool("browser_fetch", {
"url": "https://example.com/dashboard",
"wait_for": "networkidle"
})
MCP Server Modes
kernelmcp exposes three MCP server modes via the --mode flag. Each mode changes what tools are exposed and how LLM reasoning works.
agent mode (default)
The kernel has its own LLM and drives the full ReAct/LTP loop internally. The client sends a goal, the kernel plans, executes, heals, and returns the result. 32 orchestration tools are exposed (run_task, spawn_agent, get_task_status, etc.).
Best for: autonomous agents, headless deployments, programmatic usage.
kernelmcp start --mode agent
Claude Desktop config:
{
"mcpServers": {
"kernelmcp": {
"command": "kernelmcp",
"args": ["start", "--mode", "agent"]
}
}
}
router mode
The client LLM drives reasoning — the kernel just routes tool calls to the correct sub-server. ALL 90+ suite tools are exposed directly to the client. Zero double API calls: the client’s LLM calls tools directly without an intermediary LLM layer.
Best for: Claude Desktop, Cursor, VS Code, or any MCP client with its own LLM.
kernelmcp start --mode router
Claude Desktop config:
{
"mcpServers": {
"kernelmcp": {
"command": "kernelmcp",
"args": ["start", "--mode", "router"]
}
}
}
sampling mode
Same as agent mode (kernel drives the ReAct/LTP loop), but the kernel uses the client’s LLM via MCP sampling instead of its own API key. The SamplingLLMGateway routes LLM calls through the host application (VS Code, AWS Bedrock, future Claude Desktop sampling).
Best for: environments where you don’t want to manage a separate API key, or when the client already has LLM access.
kernelmcp start --mode agent --sampling
Claude Desktop config:
{
"mcpServers": {
"kernelmcp": {
"command": "kernelmcp",
"args": ["start", "--mode", "agent", "--sampling"]
}
}
}
Mode comparison
| agent (default) | router | sampling | |
|---|---|---|---|
| Who reasons | Kernel’s LLM | Client’s LLM | Client’s LLM (via sampling) |
| Tools exposed | 32 orchestration tools | ALL 90+ suite tools | 32 orchestration tools |
| Double API calls | Yes (client + kernel) | No | No |
| Needs API key | Yes | No | No |
| ReAct/LTP loop | Kernel-driven | N/A (client drives) | Kernel-driven |
| Best for | Autonomous agents | Claude Desktop, Cursor, VS Code | No-API-key deployments |
MCP Sampling
The SamplingLLMGateway enables MCP Sampling passthrough — route LLM calls through the host application (VS Code, AWS Bedrock) instead of managing API keys directly.
kernel = KernelFactory.create(llm_gateway="sampling")
Tenant Isolation
workspacemcp supports per-tenant isolation for multi-tenant deployments:
workspace:
tenant_isolation: true
base_path: /data/tenants
Each tenant gets isolated file storage, checkpoints, and artifacts. Cross-tenant access is blocked.
Quick Start
3-line usage
from kernelmcp import KernelFactory
kernel = KernelFactory.from_env()
result = await kernel.run("analyze the auth module and suggest improvements")
print(result.summary)
Full suite (all 7 servers wired in-process)
from kernelmcp import KernelFactory
kernel = KernelFactory.full_suite(
llm_model="claude-sonnet-4-6",
api_key="sk-...",
namespace="my_project",
)
result = await kernel.run(
goal="migrate the database schema to v2",
budget_usd=1.0,
mode="hybrid",
)
print(f"Summary: {result.summary}")
print(f"Steps: {len(result.steps_taken)} | Cost: ${result.cost_usd:.4f}")
# Direct tool calls (no LLM)
await kernel.call_tool("schedule_task", {
"goal": "daily backup", "job_type": "cron", "cron": "0 2 * * *"
})
Multi-Agent TaskForce
from kernelmcp import KernelFactory
from kernelmcp.agents.taskforce import TaskForce
kernel = KernelFactory.from_env()
# Launch a pre-built taskforce (templates: coding, research, writing, analysis)
tf = TaskForce.create("coding", goal="Build a payment API", registry=kernel._agent_registry)
result = await tf.run()
print(f"Agents used: {len(result.agent_results)} | Tokens: {result.total_tokens}")
MCP Server
from kernelmcp.factory import KernelFactory
from kernelmcp.mcp_server import KernelMCPServer
kernel = KernelFactory.from_env()
KernelMCPServer(kernel).run()
Or from the command line:
# Agent mode (default) -- kernel drives ReAct/LTP, exposes 32 tools
kernelmcp start
# Router mode -- client LLM drives, ALL 90+ suite tools exposed, zero double API calls
kernelmcp start --mode router
# Sampling mode -- agent mode but uses client's LLM via MCP sampling
kernelmcp start --mode agent --sampling
Claude Desktop claude_desktop_config.json (router mode recommended):
{
"mcpServers": {
"kernelmcp": {
"command": "kernelmcp",
"args": ["start", "--mode", "router"]
}
}
}
See MCP Server Modes for details on all three modes.
Features
Multi-Agent & Orchestration
- :people_holding_hands: Multi-Agent TaskForce — compose agent teams with 5 patterns: sequential, parallel, supervisor, debate, swarm
- :busts_in_silhouette: 7 Agent Types — code, research, file, memory, plan, custom, meta — each with focused tools and constitution
- :jigsaw: SharedMemory + MessageBus — inter-agent collaboration with shared state and messaging
- :shield: SandboxScope — per-agent isolation for safe concurrent execution
- :arrows_counterclockwise: Handoff Protocol — structured context transfer between agents
- :brain: MetaAgent — self-improving: analyzes failures, proposes constitution changes, generates LTP templates
Engine
- :brain: ReAct + LTP Hybrid Engine — ReAct for exploratory tasks, LTP for structured execution, hybrid mode auto-selects
- :rocket: LTP Compiler — Lean Task Protocol: compile once, execute deterministically. @PARALLEL, ON_FAIL, FOREACH, RE-PLAN, dot notation, type casting
- :robot: Hybrid Router — routes simple tasks to local models, complex to cloud (cost optimization)
- :scroll: Constitution — hardcoded PM persona with rules across 5 suite servers constraining the LLM
- :wrench: Auto-Healing — detects execution failures and injects fix-retry prompts automatically
Observability & Debugging
- :mag_right: Tracer — span-based tracing for every operation with optional OTel bridge
- :bar_chart: Analytics — token efficiency, tool/agent performance, cost breakdown, bottleneck detection
- :rewind: ReplayEngine — timeline, state_at, fork, compare for post-mortem debugging
- :clipboard: Immutable Audit Trail — every LLM call and tool invocation logged with per-model cost tracking
- :satellite: 26 Event Types — async event bus with subscribe/emit/stream
- :ocean: TaskStream — real-time SSE streaming of task execution events
Persistence & Scaling
- :floppy_disk: SQLite Checkpointing — auto-checkpoint before tasks, restore from any checkpoint
- :chart_with_upwards_trend: BudgetScaler — max cost/tokens per hour, max concurrent agents
- :whale: Docker Compose — production-ready deployment stack
- :cloud: Kubernetes — manifests for cloud-native scaling
Resilience
- :moneybag: Budget Enforcer — hard caps on tokens and cost per task and per namespace
- :zap: LLM Fallback Chain — primary -> secondary -> tertiary model failover
- :repeat: Retry with Backoff — exponential backoff on transient failures
- :traffic_light: Rate Limiter — max tasks per minute per namespace
- :electric_plug: Circuit Breaker — disables a server after N consecutive failures, auto-resets
Infrastructure
- :link: Suite Orchestrator — wires all servers into a unified tool registry (120+ tools)
- :gear: KernelFactory —
default()/create()/from_env()/from_yaml()/full_suite() - :desktop_computer: MCP Server (3 modes) — agent (kernel-driven ReAct), router (client-driven, 90+ suite tools, zero double API calls), sampling (agent + client LLM)
- :keyboard: CLI — start, taskforce, templates, new, deploy, eval, improve, cost, and more
- :alarm_clock: schedulermcp — cron, interval, and watch jobs for event-driven automation
- :mag: Web Search — via websearchmcp (SearXNG, DuckDuckGo, Mojeek, Brave + Playwright browser rendering)
- :globe_with_meridians: Dynamic MCP tools — connect any external MCP server at runtime (stdio/SSE)
- :computer: Host Access — host_exec, host_file_read/write/copy/list with HostGuard security whitelist
- :handshake: MCP Sampling — SamplingLLMGateway for passthrough to VS Code / Bedrock / future Claude Desktop
- :lock: Tenant Isolation — workspacemcp per-tenant file, memory, and execution isolation
Knowledge & Memory
- :brain: Self-RAG — retrieve, generate, self-critique, re-retrieve for verified factual answers
- :zap: ReAct RAG — multi-step iterative reasoning with multiple searches for complex questions
- :bar_chart: RAGAS Eval — 5-metric evaluation (context relevancy, precision, faithfulness, answer correctness)
- :bust_in_silhouette: User Profiles — personalized search ranking based on user preferences
- :shield: GDPR forget_user — permanently delete all data for a namespace
- :file_folder: Folder & URL Ingest — ingest entire folders or download documents from URLs
- :memo: Auto-Episodes — conversation turns automatically stored as episodic memory
- :gear: Full Context Assembly — persona + working memory + episodes + facts + RAG docs injected at every task
Installation
# Core kernel (no suite libraries)
pip install mcpaisuite-kernelmcp
# With specific libraries
pip install "mcpaisuite-kernelmcp[memorymcp,planningmcp]"
# Full suite (all servers + webhooks + observability)
pip install "mcpaisuite-kernelmcp[all]"
# [all] pulls in: memorymcp, planningmcp, ragmcp, workspacemcp,
# sandboxmcp, schedulermcp + webhooks + REST API.
# (websearchmcp, ltpmcp, evalmcp are core deps — always installed.)
# Development
pip install -e ".[dev]"
Requirements: Python 3.11+
Constitution
The Constitution is a hardcoded system prompt that constrains every LLM call. It defines the PM persona and rule domains spanning all 5 suite servers:
| Domain | Key rules |
|---|---|
| Planning | ALWAYS create a plan before executing. Never skip steps. Fix failures before advancing. |
| Memory | Relevant memories are injected at task start. Store important outcomes for future reference. |
| Knowledge | Search ragmcp before guessing. Use document context to inform decisions. |
| Workspace | Route ALL code to write_file first. Create checkpoints before modifications. |
| Execution | Execute through sandboxmcp only. Debug failures: read -> fix -> re-execute. Never hardcode secrets. |
| General | Be concise. Ask for clarification on ambiguity. Prefer local models for simple lookups. |
The Constitution can be updated at runtime via the set_constitution MCP tool or programmatically:
kernel._engine._constitution.update_rules("Your custom rules here...")
Memory context and RAG context are injected into the system prompt dynamically at each turn.
Hybrid Router
The TaskSupervisor estimates task complexity (0.0 to 1.0) and routes to the optimal model:
| Complexity | Range | Model | Use case |
|---|---|---|---|
| Simple | < 0.3 | ollama/mistral (local) | Status checks, lookups, simple queries |
| Medium | 0.3 - 0.7 | claude-haiku-4-5 (fast) | Summaries, explanations, searches |
| Complex | > 0.7 | claude-sonnet-4-6 (cloud) | Code generation, architecture, multi-step tasks |
Complexity scoring uses keyword analysis, task length, multi-step indicators, and code-related terms. Routing can be disabled to always use the cloud model:
kernel = KernelFactory.create(enable_routing=False)
Auto-Healing
When a tool execution fails (specifically execute_code via sandboxmcp), the engine automatically injects a system prompt instructing the LLM to:
- Analyze the error from stderr
- Fix the code using
workspacemcp.edit_file - Re-execute using
sandboxmcp.execute_code - NOT advance the plan until
exit_code == 0
This creates a self-correcting loop without human intervention. Auto-healing can be disabled:
kernel = KernelFactory.create(auto_heal=False)
Resilience
kernelmcp includes three resilience layers that protect against failures and runaway costs:
Budget Enforcer
Hard caps on tokens and cost per task. When exceeded, the task is marked as failed immediately.
kernel = KernelFactory.create(
max_tokens_per_task=50000, # token ceiling
max_cost_per_task=1.0, # dollar ceiling
)
Per-namespace spend tracking is available via BudgetEnforcer.get_spent(namespace).
Rate Limiter
Limits the number of tasks per minute per namespace (default: 30/min). Tasks that exceed the limit are rejected immediately.
Circuit Breaker
Disables a server after N consecutive failures (default: 3). Automatically resets after a cooldown period (default: 60 seconds). Prevents cascading failures when a suite library is down.
LLM Fallback Chain
If the primary LLM fails, the engine falls back through a chain of models:
claude-sonnet-4-6 --> gpt-4o --> ollama/mistral
Retry with Backoff
Failed tool calls are retried with exponential backoff (base delay: 1s, max delay: 30s, default retries: 1).
Audit Trail
Every LLM call and tool invocation is logged as an immutable AuditEntry with:
- Task ID, action type, model name, tool name
- Token count and cost
- Success/failure status
- Namespace and timestamp
Two backends are available:
| Backend | Use case |
|---|---|
InMemoryAuditLogger | Development, testing |
SQLiteAuditLogger | Production — persistent, queryable |
Per-model cost tracking
# Get cost breakdown by model
costs = await audit.cost_by_model(namespace="default")
# {"claude-sonnet-4-6": {"cost": 0.42, "tokens": 12500}, "ollama/mistral": {"cost": 0.0, "tokens": 800}}
Querying the audit trail
# Recent entries
entries = await audit.query(namespace="default", action="llm_call", limit=50)
# Total entry count
count = await audit.count(namespace="default")
MCP Tools
The full suite exposes 120+ tools across all servers. The kernel itself provides orchestration, observability, and persistence tools via the MCP protocol (stdio transport):
Orchestration Tools
| Tool | Description |
|---|---|
run_task | Submit a task for autonomous execution |
get_task_status | Check task progress |
list_tasks | List all tasks |
cancel_task | Abort a running task |
get_turns | Get ReAct turn history |
spawn_agent | Spawn a specialized agent (code, research, file, memory, plan, custom, meta) |
create_taskforce | Create a multi-agent taskforce |
kernel_stats | Token costs, latency, model usage |
kernel_config | View kernel configuration |
set_constitution | Update PM rules |
trigger_webhook | Fire a webhook event |
health | Kernel health + server connections |
Observability Tools
| Tool | Description |
|---|---|
get_trace | Retrieve the full trace for a task |
get_analytics | Get analytics summary for a namespace |
compare_runs | Compare two task runs side by side |
Persistence Tools
| Tool | Description |
|---|---|
checkpoint | Create a checkpoint for a running task |
restore | Restore a task from a checkpoint |
list_checkpoints | List available checkpoints |
Agent & Background Tools
| Tool | Description |
|---|---|
list_agents | List available sub-agent types |
run_background | Run a task asynchronously, returns an operation ID |
get_operation | Check the status of a background operation |
list_operations | List all background operations |
Introspection Tools
| Tool | Description |
|---|---|
kernel_audit | View kernel task audit log (completions, failures, costs) |
improve | Analyze kernel performance and suggest improvements |
list_taskforce_templates | List available TaskForce templates with agent configs |
list_taskforce_examples | List pre-built taskforce configs (secure_coding, research_verify, enterprise) |
CLI
kernelmcp start # Start the kernel MCP server
kernelmcp start --mode router # Start in router mode
kernelmcp start --sampling # Use the client's LLM via MCP sampling
kernelmcp taskforce "Build REST API" --template coding # Launch a multi-agent taskforce
kernelmcp taskforce "Analyze sales data" --pattern debate # Custom pattern
kernelmcp templates # List available taskforce templates
kernelmcp new coding "Build a REST API" # Generate a TaskForce config file from a template
kernelmcp deploy taskforce.json # Deploy (run) a TaskForce from a JSON config file
kernelmcp eval --suite memory # Run evaluation benchmarks
kernelmcp improve --dry-run # Preview MetaAgent improvements
kernelmcp improve # Apply MetaAgent improvements
kernelmcp cost # Show token cost summary
kernelmcp run "migrate the database" # Execute a task interactively
kernelmcp agents # List available agent types
kernelmcp status # Show kernel health and connected servers
kernelmcp stop # Stop the kernel daemon
kernelmcp logs --tail 50 # View recent events
kernelmcp config # View kernel configuration
kernelmcp servers # List connected MCP servers and tool count
Events (26 types)
kernelmcp emits 26 event types through an async event bus. Subscribe to monitor task execution in real time.
| Event | Emitted when |
|---|---|
task.started | A task begins execution |
task.completed | A task finishes successfully |
task.failed | A task fails (budget, max turns, error) |
task.cancelled | A task is cancelled |
turn.started | A new ReAct turn begins |
turn.completed | A ReAct turn finishes |
tool.called | A tool is about to be executed |
tool.succeeded | A tool call returns successfully |
tool.failed | A tool call fails |
plan.enforced | A plan step is enforced |
auto_heal.triggered | Auto-healing activates after execution failure |
context.bootstrapped | Memory context is loaded at task start |
context.trimmed | Context window is trimmed to fit token limits |
llm.called | An LLM call is made |
llm.delta | A streaming token/delta is emitted |
llm.routed | The supervisor selects a model |
webhook.received | An external webhook is received |
taskforce.started | A TaskForce begins execution |
taskforce.completed | A TaskForce finishes successfully |
taskforce.failed | A TaskForce fails |
agent.handoff | One agent hands off context to another |
agent.message | An agent-to-agent message is sent |
from kernelmcp.events import kernel_event_bus
# Stream all events
async for event in kernel_event_bus.stream():
print(f"{event.type.value}: {event.data}")
# Subscribe with a queue
queue = kernel_event_bus.subscribe()
event = await queue.get()
Factory
KernelFactory provides multiple construction methods:
from kernelmcp import KernelFactory
# Zero config -- in-memory, no libraries, just LLM + ReAct
kernel = KernelFactory.default()
# From environment variables (KERNELMCP_MODEL, ANTHROPIC_API_KEY, etc.)
kernel = KernelFactory.from_env()
# From a YAML configuration file
kernel = KernelFactory.from_yaml("kernel_config.yaml")
# Full configuration
kernel = KernelFactory.create(
llm_model="claude-sonnet-4-6",
local_model="ollama/mistral",
fast_model="claude-haiku-4-5-20251001",
api_key="sk-...",
enable_routing=True,
max_turns=20,
max_tokens_per_task=50000,
max_cost_per_task=1.0,
auto_plan=True,
auto_heal=True,
auto_memory=True,
jit=False, # Agent-JIT: reuse shadow-validated solution patterns
nano=False, # fast path for trivial single-shot tasks
namespace="my_project",
memory_pipeline=memory, # from memorymcp
planning_pipeline=planning, # from planningmcp
rag_pipeline=rag, # from ragmcp
workspace_pipeline=workspace, # from workspacemcp
sandbox_pipeline=sandbox, # from sandboxmcp
)
# Full suite -- all 7 servers auto-detected and wired
kernel = KernelFactory.full_suite(
llm_model="claude-sonnet-4-6",
api_key="sk-...",
namespace="default",
)
Environment variables
| Variable | Default | Description |
|---|---|---|
KERNELMCP_MODEL | claude-sonnet-4-6 | Primary LLM model |
KERNELMCP_LOCAL_MODEL | ollama/mistral | Local model for simple tasks |
KERNELMCP_ROUTING | true | Enable hybrid routing |
KERNELMCP_MAX_TURNS | 20 | Maximum ReAct turns per task |
KERNELMCP_MAX_TOKENS | 50000 | Token budget per task |
KERNELMCP_JIT | false | Reuse shadow-validated solution patterns across repeated task families (Agent-JIT) |
KERNELMCP_NANO | false | Fast path for trivial single-shot tasks |
KERNELMCP_NAMESPACE | default | Default namespace |
ANTHROPIC_API_KEY | — | Anthropic API key |
OPENAI_API_KEY | — | OpenAI API key (fallback) |
Development
git clone https://github.com/gashel01/kernelmcp
cd kernelmcp
pip install -e ".[dev]"
# Run tests
pytest tests/ -v # 1285+ tests
# With coverage
pytest tests/ --cov=kernelmcp --cov-report=html
Project structure
kernelmcp/
core/
models.py -- Task, Turn, ToolCall, KernelConfig; AgentType (7 types: code, research, file, memory, plan, custom, meta)
engine.py -- ReAct engine (the autonomous execution loop)
ltp_runner.py -- LTP compiler + deterministic executor (@PARALLEL, ON_FAIL, FOREACH, RE-PLAN)
constitution.py -- PM persona and hardcoded LLM constraints
context.py -- Context window manager
audit.py -- InMemory + SQLite audit loggers
resilience.py -- Fallback chain, retry, rate limiter
bootstrap.py / tool_executor.py / tool_selection.py / nudges.py / ...
agents/
base.py -- Agent base + search_tools meta-tool
meta_agent.py -- MetaAgent (self-improving: failure analysis, constitution updates)
patterns.py -- 5 patterns: sequential, parallel, supervisor, debate, swarm
taskforce.py -- TaskForce orchestrator + templates
shared_memory.py / message_bus.py / sandbox_scope.py / graph_executor.py / registry.py
code_agent.py / research_agent.py / file_agent.py / memory_agent.py
routing/
supervisor.py -- Hybrid router (complexity estimation + model selection)
llm_gateway.py -- LLM abstraction layer (litellm)
sampling_gateway.py -- SamplingLLMGateway for MCP Sampling
observability/
tracer.py -- Span-based tracing + analytics
replay.py -- ReplayEngine: timeline, state_at, fork, compare
hub.py -- connect_hub() control-plane connector
otel.py -- OpenTelemetry export (optional)
integration/
orchestrator.py -- Suite orchestrator (wires all servers) + host tools (HostGuard via sandboxmcp)
mcp_client.py -- Dynamic MCP server connections
langchain_adapter.py -- LangChain tools bridge
enterprise/ -- RBACManager, CostAllocator, SLAMonitor
multimodal/ -- VisionAnalyzer, AudioTranscriber
persistence/ -- SQLite checkpointing and recovery
a2a/ -- A2AServer / A2AClient / A2ABridge
mcp_client/ -- MCP client utilities
triggers/ -- Trigger / scheduling support
api/ -- FastAPI app (optional)
scaling.py -- BudgetScaler: cost/token/concurrency limits
streaming.py -- TaskStream with SSE events
pipeline.py -- KernelPipeline, BudgetEnforcer, CircuitBreaker
events.py -- Event bus (26 event types)
factory.py -- KernelFactory
facade.py -- High-level facade
mcp_server.py -- MCP server (32 tools)
cli.py -- CLI commands
Ecosystem & Interoperability
kernelmcp is the only AI agent framework that natively connects to three ecosystems:
Dynamic MCP Client
Connect to any of the 2000+ MCP servers at runtime:
await kernel.orchestrator.connect_mcp_server("github", transport="stdio", command="github-mcp-server")
await kernel.orchestrator.connect_mcp_server("slack", transport="sse", url="http://localhost:8080/sse")
# Tools appear automatically — the LLM can use them immediately
LangChain Tool Bridge
Use any of LangChain’s 500+ tools without leaving kernelmcp:
from langchain_community.tools import WikipediaQueryRun
kernel.orchestrator.register_langchain_tool(WikipediaQueryRun())
# lc__wikipedia is now available to the engine
License
AGPL-3.0 — see LICENSE.
For commercial licensing (closed-source usage), contact the author.