Home / Docs / kernelmcp
On this page

kernelmcp

⚙ 32 tools

kernelmcp

The sovereign orchestrator — connects all MCP AI suite libraries into an autonomous agent.

Python 3.11+ License: AGPL-3.0 Tests: 1285+

The brain of the MCP AI suite: ragmcp (knowledge) · memorymcp (memory) · planningmcp (reasoning) · workspacemcp (files) · sandboxmcp (execution) · schedulermcp (scheduling) · websearchmcp (search) · ltpmcp (protocols) · evalmcp (evaluation) · kernelmcp (orchestrator)

120+ native tools — and connects at runtime to any MCP server or LangChain tool (2000+ community MCP servers and 500+ LangChain tools are reachable, none bundled). 7 agent types. Multi-agent TaskForce with 5 patterns. ReAct + LTP hybrid engine. Dynamic MCP client. LangChain bridge. Self-hosted Hub connector (monitor + control your own kernels). Zero manual intervention.

Philosophy: “The orchestrator feels nothing. It reads the event, enforces the plan, delegates the execution, and reports the result.”


What is kernelmcp?

kernelmcp is the central nervous system that wires the entire MCP AI suite into a multi-agent orchestrator. It receives a goal, creates a plan, delegates work to specialized servers and agents, tracks costs, heals failures, and returns results — all through a ReAct + LTP (Lean Task Protocol) hybrid engine driven by an LLM.

Without kernelmcp, each library operates independently. With it, they become a coordinated multi-agent system:

Integration Matrix

CapabilityLibraryHow kernelmcp uses it
Knowledge retrievalragmcpSearches documents before the LLM guesses. Self-RAG for verified answers. ReAct RAG for multi-step reasoning. RAGAS eval for quality measurement.
Persistent memorymemorymcpFull context assembly (persona + working memory + episodes + facts) at task start. Auto-stores episodes and outcomes. Consolidation + decay engine. GDPR forget_user.
Plan decompositionplanningmcpCreates and enforces step-by-step plans with templates (deploy, migrate, audit…). Cost estimation, validation, replan on failure.
File operationsworkspacemcpReads, writes, checkpoints files with DLP secret detection and approval gates for sensitive files.
Code executionsandboxmcpRuns code in isolated sandboxes with auto-heal. Vault for secrets, artifact signing, code validation.
Task schedulingschedulermcpCron, interval, and watch jobs for event-driven automation
Host accesskernelmcphost_exec, host file tools with per-tenant HostGuard whitelist
Web searchwebsearchmcpSearXNG self-hosted search + Playwright browser_fetch

Governance & Security

Governance in kernelmcp is enforced at the execution layer — a code path that stops the agent regardless of what the LLM “decides” — not asked for in a prompt. Every tool call funnels through a single chokepoint where the rules are applied.

  • DLP secret guard — every tool result is scanned and secrets (AWS keys, tokens, private keys, connection strings) are redacted before they reach the model, so it can’t leak what it never sees. Outbound tool calls whose arguments carry a secret are blocked at dispatch. Deterministic pattern matching, not LLM judgement. Opt-in: enable_dlp=True / KERNELMCP_DLP; emits secret.redacted / secret.blocked events.
  • Provable plans — when running in LTP/hybrid mode, kernelmcp can statically verify the compiled plan against a policy before it executes a single step. Pass a ltpmcp.PlanPolicy as plan_policy (opt-in; default off): the kernel runs verify_plan on the plan and refuses it if it violates the policy — denied tools, a step budget, runtime escape hatches, a no-egress-after-sensitive-read data-flow rule, or secret/path arguments. An exfiltration plan is refused at 0 steps, before any tool runs. See ltpmcp → Provable plans.
  • Budget cap — hard token/cost ceiling per task; the loop stops the moment it’s exceeded.
  • Loop detection — identical repeated tool calls are blocked (TOOL_DISPATCH_BLOCKED).
  • Approval gates & checkpoints (via workspacemcp) — destructive file ops on sensitive paths require approval and are snapshotted first.
  • Egress allowlist, RBAC, and a full audit trail — every block carries a reason and a correlation id, and is replayable.

Same model, same prompt — the only difference is whether the rules are enforced or merely prompted.


Multi-Agent TaskForce

The headline feature: compose multiple specialized agents into coordinated teams that collaborate on complex goals.

Creating a TaskForce

from kernelmcp import KernelFactory
from kernelmcp.agents.taskforce import TaskForce

kernel = KernelFactory.from_env()
registry = kernel._agent_registry

# Pre-built template (coding / research / writing / analysis)
tf = TaskForce.create("coding", goal="Build a REST API with auth", registry=registry)

# Or build a taskforce from explicit agents
from kernelmcp.agents.patterns import AgentConfig

tf = TaskForce(
    agents=[
        AgentConfig(type="research", role="Researcher"),
        AgentConfig(type="code", role="Implementer"),
        AgentConfig(type="file", role="Documenter"),
    ],
    goal="Research and implement caching strategy",
    pattern="sequential",
    registry=registry,
)

result = await tf.run()

5 Execution Patterns

PatternDescriptionUse case
sequentialAgents execute one after another, each building on the previous resultLinear workflows, pipelines
parallelAll agents execute concurrently, results are mergedIndependent subtasks, speed
supervisorA supervisor agent delegates to workers and synthesizes resultsComplex multi-step goals
debateAgents argue opposing positions, a judge selects the best answerDecision-making, verification
swarmAgents self-organize dynamically based on the task stateEmergent collaboration

Inter-Agent Infrastructure

ComponentDescription
SharedMemoryShared key-value store accessible by all agents in the taskforce
MessageBusAgent-to-agent messaging for coordination and status updates
SandboxScopePer-agent isolation — each agent gets its own sandbox, files, and context
HandoffStructured context transfer when one agent hands off to another

Pre-Built TaskForce Templates

TemplateAgentsPatternPurpose
codingcode, code, filesequentialBuild software with tests and docs
researchresearch, research, memorysequentialResearch a topic with verification and summary
writingresearch, custom, customsequentialDraft and edit written content from research
analysisresearch, code, customsequentialGather data, analyze it, and write a report

7 Agent Types

kernelmcp provides 7 specialized agent types, each with a focused tool set and constitution:

Agent TypeRoleKey capabilities
codeSoftware engineerWrite, edit, execute, debug code via sandboxmcp
researchResearcherSearch the web (via websearchmcp), browse, RAG queries, synthesize
fileFile managerRead, write, checkpoint, organize files via workspacemcp
memoryMemory curatorStore, retrieve, consolidate, forget via memorymcp
planPlannerDecompose goals, create plans, estimate costs via planningmcp
customUser-definedAny combination of tools and constitution rules
metaMeta-agentAnalyze runs, improve constitutions, generate templates
from kernelmcp import KernelFactory

# Spawn a specific agent type
kernel = KernelFactory.from_env()
result = await kernel.spawn_agent("research", goal="Find best practices for API rate limiting")

MetaAgent (Self-Improving)

The MetaAgent analyzes past runs and proposes improvements to the system itself:

  • Failure analysis — examines failed or slow runs to identify root causes
  • Constitution improvements — proposes rule changes based on observed failure patterns
  • LTP template generation — auto-generates reusable LTP templates from successful runs
  • Dry-run mode — preview proposed changes before applying them
# CLI: analyze recent runs and suggest improvements
kernelmcp improve --dry-run

# Apply improvements
kernelmcp improve
from kernelmcp.agents.meta_agent import MetaAgent

meta = MetaAgent(
    llm=kernel._engine._llm,
    orchestrator=kernel.orchestrator,
    audit_logger=kernel._audit,
)

report = await meta.analyze(namespace="default", limit=200)
suggestions = await meta.suggest(report)
for s in suggestions:
    print(f"[{s.type}] {s.content} (confidence={s.confidence:.2f})")

Observability

Full visibility into agent execution with tracing, analytics, and replay.

Tracer

Span-based tracing for every operation:

from kernelmcp.observability import Tracer

tracer = Tracer()
span = tracer.start_span("my_task", attributes={"model": "claude-sonnet-4-6"})
try:
    result = await kernel.run("analyze code")
    tracer.end_span(span, status="ok")
except Exception as exc:
    tracer.end_span(span, status="error", error=str(exc))
    raise

# Retrieve the trace tree for a task later
spans = tracer.get_trace(task_id="abc123")

Analytics

Built-in analytics for token efficiency, tool performance, agent performance, cost breakdown, and bottleneck detection:

from kernelmcp.observability import Analytics

# Analytics exposes static methods over recorded spans
spans = tracer.get_all_spans()

efficiency = Analytics.token_efficiency(spans)   # total/avg tokens, wasted tokens
tools = Analytics.tool_performance(spans)         # per-tool call count, success rate, latency
agents = Analytics.agent_performance(spans)       # per-agent success rate, avg turns/tokens
costs = Analytics.cost_breakdown(spans)           # cost by model, agent, tool
slow = Analytics.bottlenecks(spans, threshold_ms=5000)

# Or get everything at once from the tracer
report = tracer.get_analytics()

ReplayEngine

Replay and debug past runs:

from kernelmcp.observability import ReplayEngine

replay = ReplayEngine(audit_logger=kernel._audit, tracer=tracer)

# Register a completed task, then inspect it
replay.register_task(task)

timeline = replay.get_timeline(task_id="abc123")
state = replay.get_state_at(task_id="abc123", turn_index=5)
forked = replay.fork(task_id="abc123", from_turn=3)
diff = replay.compare(task_id_a="abc", task_id_b="def")

OTel Bridge (Optional)

Export traces to any OpenTelemetry-compatible backend (Jaeger, Zipkin, Grafana Tempo):

kernel = KernelFactory.create(otel_endpoint="http://localhost:4317")

Observability MCP Tools

ToolDescription
get_traceRetrieve the full trace for a task
get_analyticsGet analytics summary for a namespace
compare_runsCompare two task runs side by side

Self-Hosted Hub (monitor & control your kernels)

Embed kernelmcp in your own apps and point each one at a self-hosted Hub to monitor your kernels from one place — and optionally control them. Monitoring is telemetry push over an outbound-only connection (no inbound port on your app); control is opt-in.

from kernelmcp import KernelFactory, connect_hub

kernel = KernelFactory.from_env()

# Monitoring only (always on once connected):
await connect_hub(kernel, hub_url="http://my-hub:8007", project="prod", api_key="kmh_...")

# ...or also let the Hub send commands to this kernel (opt-in):
await connect_hub(kernel, hub_url="http://my-hub:8007", project="prod",
                  api_key="kmh_...", allow_control=True)

# Use the kernel normally — finished tasks show up in your Hub.

connect_hub(...) is fail-safe and a no-op when unconfigured (it also reads KERNELMCP_HUB_URL / KERNELMCP_HUB_KEY / KERNELMCP_HUB_PROJECT from the environment), so it is always safe to call unconditionally. With allow_control=True the Hub can send ping / stats / set_config / run / cancel commands; pass run_handler(goal) to customize how run executes. Unlike a telemetry collector, this is a control plane your own embedded kernels connect to — you self-host it and keep your data.


Persistence

SQLite-based checkpointing for task state recovery.

StateManager

from kernelmcp.persistence.state_manager import StateManager
from kernelmcp.persistence.checkpoint import SQLiteCheckpointStore

# KernelFactory.create() already wires a StateManager as kernel._state_manager.
# To build one manually:
store = SQLiteCheckpointStore(db_path="kernel_state.db")
state_mgr = StateManager(store=store, kernel_pipeline=kernel)

# Manual checkpoint -- returns a checkpoint id
checkpoint_id = await state_mgr.checkpoint(label="before migration")

# Recover from latest checkpoint (or pass a specific checkpoint_id)
restored = await state_mgr.restore()

# List recent checkpoints
checkpoints = await state_mgr.list_checkpoints(limit=20)

Persistence MCP Tools

ToolDescription
checkpointCreate a checkpoint for a running task
restoreRestore a task from a checkpoint
list_checkpointsList available checkpoints

A2A Protocol

Agent-to-Agent (A2A) protocol support for cross-agent interoperability.

Components

ComponentDescription
A2AServerExposes any kernel agent as an A2A endpoint (/.well-known/agent.json + /a2a/tasks)
A2AClientDiscover and call external A2A agents by their well-known URL
A2ABridgeAuto-creates MCP tools from discovered A2A agent skills

Usage

from kernelmcp.a2a import A2AServer, A2AClient
from kernelmcp.a2a.bridge import A2ABridge  # not re-exported from kernelmcp.a2a

# Expose this agent as an A2A endpoint
server = A2AServer(kernel, skills=["code_review", "testing"])
await server.start(port=8080)
# Now discoverable at http://localhost:8080/.well-known/agent.json

# Call an external A2A agent
client = A2AClient("https://other-agent.example.com")
card = await client.discover()  # fetch agent card
result = await client.send_task("review this PR", skill="code_review")

# Bridge: auto-register external A2A skills as local MCP tools
bridge = A2ABridge(client, kernel.orchestrator)
tool_names = await bridge.register_agent("https://other-agent.example.com")
# Now callable as: kernel.call_tool("a2a_code_review", {...})

Multi-Modal

Vision and audio analysis capabilities exposed as MCP tools.

Inline vision (the agent sees images in its loop)

Beyond the analysis tools below, the ReAct loop is inline-multimodal: when a tool returns an image (a screenshot of a page, or read_file on a .png/.jpg), the pixels are injected into the conversation as real image blocks so the same reasoning model sees them on the next turn — not a separate text description. This is gated on the configured model supporting vision. It lets an agent verify a UI it built (screenshot("http://localhost:3000")) or inspect a frame it generated, then iterate: render → screenshot/read_file → look → fix.

Components

ComponentDescription
VisionAnalyzerImage/screenshot analysis: analyze, analyze_screenshot, analyze_file, compare
AudioTranscriberAudio processing: transcribe, summarize

MCP Tools

ToolDescription
analyze_imageAnalyze an image from URL or base64 with a prompt
analyze_screenshotCapture and analyze a screenshot of a URL
screenshotCapture a URL (or local app) and return the image inline so the agent sees it directly

Usage

from kernelmcp.multimodal import VisionAnalyzer, AudioTranscriber

# Vision
vision = VisionAnalyzer(kernel)
result = await vision.analyze_file("screenshot.png", question="Describe the UI layout")
diff = await vision.compare(["before.png", "after.png"], question="What changed?")
analysis = await vision.analyze_file("diagram.pdf", question="Extract the architecture")

# Audio
audio = AudioTranscriber(kernel)
transcript = await audio.transcribe("meeting.mp3")
summary = await audio.summarize("meeting.mp3", prompt="Key decisions and action items")

Enterprise

Role-based access control, cost allocation, and SLA monitoring for production deployments.

Components

ComponentDescription
RBACManager4 default roles (admin, operator, viewer, agent) with namespace + tool permissions
CostAllocatorPer-namespace, per-agent, and per-tool cost tracking with budget alerts
SLAMonitorSLA rules, violation alerts, and webhook notifications

Usage

from kernelmcp.enterprise.rbac import RBACManager, Permission
from kernelmcp.enterprise.cost_allocation import CostAllocator
from kernelmcp.enterprise.sla import SLAMonitor, SLARule

# RBAC -- define roles, assign users, then check permissions
rbac = RBACManager()  # or RBACManager.default_roles() for admin/operator/viewer/agent
rbac.add_role("operator", [
    Permission(role="operator", namespace="prod", tools=["run_task", "get_task_status"]),
])
rbac.assign_user("alice", "operator")
rbac.check("alice", tool="host_exec", namespace="prod")  # -> False (denied)

# Cost allocation -- track spend per namespace/agent/tool
costs = CostAllocator()
await costs.set_budget(namespace="prod", max_cost=500.0)
await costs.record(namespace="prod", cost=0.42, agent_type="code", tool="execute_code")
report = await costs.get_report(namespace="prod")  # spent, budget, within_budget

# SLA monitoring -- add rules, then check against live metrics
sla = SLAMonitor()  # or SLAMonitor.default_sla() for sensible presets
sla.add_rule(SLARule(name="P95 Latency", metric="p95_latency_ms", threshold=5000, operator="lte"))
sla.add_rule(SLARule(name="Error Rate", metric="error_rate", threshold=0.01, operator="lte"))
sla.set_webhook("https://hooks.example.com/alerts")
alerts = await sla.check({"p95_latency_ms": 6200, "error_rate": 0.03})

Scaling

BudgetScaler

Control cost and concurrency at the system level:

from kernelmcp.scaling import BudgetScaler

scaler = BudgetScaler(
    max_cost_per_hour=5.0,       # USD ceiling per hour
    max_tokens_per_hour=500000,  # Token ceiling per hour
    max_concurrent_agents=10,    # Agent concurrency limit
)
kernel = KernelFactory.create(scaler=scaler)

Docker Compose

Production-ready stack with all services:

docker compose -f deploy/docker-compose.yml up

Kubernetes

Kubernetes manifests for cloud-native deployment:

kubectl apply -f deploy/k8s/

Streaming

Real-time task execution streaming via Server-Sent Events (SSE):

from kernelmcp.streaming import TaskStream

stream = TaskStream(kernel)
async for event in stream.run("build the API"):
    print(f"[{event.type}] {event.data}")
    # task.started, turn.completed, tool.called, task.completed, ...

SSE events are compatible with any SSE client (browser EventSource, curl, etc.).


ReAct Engine

The ReAct Engine is the heart of kernelmcp. It runs a recursive loop: ask the LLM, execute tool calls, feed results back, repeat until the task is complete or the budget is exhausted.

Loop features:

  • Context compaction — when the conversation approaches the window size, older turns are summarized by the LLM into a single, fact-preserving summary turn (incremental, with a circuit breaker) instead of being silently truncated.
  • In-loop todo list — for multi-step tasks the agent maintains a checklist via write_todos (TodoWrite-style), re-injected each turn so it stays oriented; offered only for non-trivial tasks.
  • Inline multimodal — tool-returned images (screenshots, generated frames) are shown to the model directly (see Multi-Modal).
  • Parallel read-only tools — side-effect-free tool calls issued together (read/grep/glob) run concurrently; stateful calls stay serial.
  • Workspace awareness — a bounded top-level snapshot of the workspace is surfaced so the agent doesn’t guess whether a file exists.
User Goal
Bootstrap Contextmemorymcp + ragmcp
Route to LLMSupervisor: local / fast / cloud
ReAct Loop
  1. Build system prompt — Constitution + context
  2. Call LLM — with fallback
  3. If tool_calls — execute via orchestrator, auto-heal on failure, audit + cost tracking, then loop back to step 1
  4. If text response — task complete, store outcome in memory
Return Task Resultstatus, turns, cost

LTP Engine (Lean Task Protocol)

The LTP engine compiles natural language plans into deterministic execution graphs. Unlike ReAct (which reasons at every step), LTP compiles once and executes deterministically — faster, cheaper, and predictable for structured tasks.

Hybrid mode (default)

Hybrid mode auto-selects between ReAct and LTP based on task structure:

# Hybrid mode (default) -- kernel decides
result = await kernel.run_task("deploy the API to staging and production")

# Force LTP mode
result = await kernel.run_task("deploy the API", mode="ltp")

# Force ReAct mode
result = await kernel.run_task("debug the auth bug", mode="react")

LTP directives

DirectiveDescriptionExample
@PARALLELExecute steps concurrentlySTEP 2a @PARALLEL: run tests
ON_FAILError handling per stepON_FAIL: RE-PLAN
FOREACHIterate over a collectionFOREACH env IN [staging, prod]:
RE-PLANDynamic replanning on failureON_FAIL: RE-PLAN
Dot notationAccess data from previous steps{{step1.output.url}}
Type castingCast values between types{{step2.count | int}}

Compiled plan example

STEP 1: search_documents "API authentication patterns"
STEP 2a @PARALLEL: write_file api/auth.py
STEP 2b @PARALLEL: write_file tests/test_auth.py
STEP 3: execute_code "pytest tests/test_auth.py"
  ON_FAIL: RE-PLAN
STEP 4: FOREACH env IN [staging, production]:
  deploy --target {{env}} --artifact {{step2a.output.path}}

Agent-JIT cache (experimental, situational)

Many workloads repeat the same kind of task with different parameters (“sum of squares to 100”, then “…to 900”). Agent-JIT amortizes those repeats. It is experimental and off by default — a genuine win on the right workload, but not a universal speedup (measured details below).

  • First sighting (cold). The task runs through the normal engine. Its winning solution (the execute_code pattern) is cached, keyed by a semantic signature — the goal with numbers masked, embedded locally (fastembed) so paraphrases of the same family match.
  • Second sighting (shadow). The cached pattern is adapted to the new goal and run, and the cold engine runs too. Their outputs are compared deterministically (not by an LLM). A match marks the family trusted.
  • Trusted (warm). Later instances skip cold reasoning entirely: adapt the validated pattern and execute it. Much cheaper, and safe — trust came from exact output comparison, not a model’s judgement.
  • Correctness is safe; cost is not always. A warm answer is only used after deterministic output validation, and any mismatch falls back to the full engine — so it never ships an unvalidated answer. But it is not “never cheaper”: measured, a warm reuse is ~34× cheaper (~330 vs ~11k tokens) once engaged, yet engagement requires the family to reliably route through execute_code (non-deterministic for simple tasks) and the cold runs to validate the shadow. On low-repetition or non-code-routed traffic it can be net-neutral to ~+15% (an un-amortized shadow pass). That’s why it’s experimental and off by default — turn it on for known repetitive, code-heavy workloads, not as a blanket optimization.

Off by default — opt in per kernel:

kernel = KernelFactory.from_env()              # honours KERNELMCP_JIT=1
# or explicitly
kernel = KernelFactory.create(..., jit=True)

Inspect or clear the cache (optionally per namespace) from any surface:

kernel.jit_stats()   # {"enabled": True, "families": 3, "trusted": 2, "total_hits": 41, ...}
kernel.jit_clear()   # drop cached patterns; returns the count removed
kernelmcp run "sum of squares to 500" --jit
kernelmcp jit stats
kernelmcp jit clear

Also exposed as MCP tools (jit_stats, jit_clear), HTTP endpoints (GET /jit/stats, POST /jit/clear), and a toggle in the Hub’s Engine settings. Patterns persist to ~/.kernelmcp/jit_cache.json and are isolated per namespace (tenant-safe).


schedulermcp

Event-driven task scheduling with four schedule types:

TypeDescriptionExample
onceRun once at a specific timerun_at: "2026-04-28T09:00:00"
cronCron expression schedulecron_expr: "0 9 * * *"
intervalRun every N secondsinterval_seconds: 300
watchTrigger on condition changewatch_command + watch_condition

Watch jobs

Watch jobs monitor a command’s output and trigger when conditions are met:

# Watch for errors in the log file
await kernel.call_tool("schedule_task", {
    "name": "error_watcher",
    "schedule_type": "watch",
    "watch_command": "tail -1 /var/log/app.log",
    "watch_condition": "contains:ERROR",
    "task": "Analyze the error and suggest a fix"
})

Host System Access

Secure host commands and file operations via HostGuard whitelist:

ToolDescription
host_execExecute a whitelisted command on the host
host_file_readRead a file from the host filesystem
host_file_writeWrite content to a file on the host
host_file_copyCopy a file on the host
host_file_listList files in a host directory

All operations are gated by HostGuard — commands and paths must be explicitly whitelisted. Unapproved operations are rejected.


Web Search & Browser Fetch (via websearchmcp)

The kernel delegates web search and browser fetch to websearchmcp, which provides:

  • SearXNG — self-hosted meta search engine with multi-engine rotation (Google, DuckDuckGo, Brave, Bing). Privacy-respecting, no tracking.
  • browser_fetch — Playwright-based headless browser for rendering JavaScript-heavy pages before extracting content.
# The kernel routes web tools to websearchmcp automatically
content = await kernel.call_tool("browser_fetch", {
    "url": "https://example.com/dashboard",
    "wait_for": "networkidle"
})

MCP Server Modes

kernelmcp exposes three MCP server modes via the --mode flag. Each mode changes what tools are exposed and how LLM reasoning works.

agent mode (default)

The kernel has its own LLM and drives the full ReAct/LTP loop internally. The client sends a goal, the kernel plans, executes, heals, and returns the result. 32 orchestration tools are exposed (run_task, spawn_agent, get_task_status, etc.).

Best for: autonomous agents, headless deployments, programmatic usage.

kernelmcp start --mode agent

Claude Desktop config:

{
  "mcpServers": {
    "kernelmcp": {
      "command": "kernelmcp",
      "args": ["start", "--mode", "agent"]
    }
  }
}

router mode

The client LLM drives reasoning — the kernel just routes tool calls to the correct sub-server. ALL 90+ suite tools are exposed directly to the client. Zero double API calls: the client’s LLM calls tools directly without an intermediary LLM layer.

Best for: Claude Desktop, Cursor, VS Code, or any MCP client with its own LLM.

kernelmcp start --mode router

Claude Desktop config:

{
  "mcpServers": {
    "kernelmcp": {
      "command": "kernelmcp",
      "args": ["start", "--mode", "router"]
    }
  }
}

sampling mode

Same as agent mode (kernel drives the ReAct/LTP loop), but the kernel uses the client’s LLM via MCP sampling instead of its own API key. The SamplingLLMGateway routes LLM calls through the host application (VS Code, AWS Bedrock, future Claude Desktop sampling).

Best for: environments where you don’t want to manage a separate API key, or when the client already has LLM access.

kernelmcp start --mode agent --sampling

Claude Desktop config:

{
  "mcpServers": {
    "kernelmcp": {
      "command": "kernelmcp",
      "args": ["start", "--mode", "agent", "--sampling"]
    }
  }
}

Mode comparison

agent (default)routersampling
Who reasonsKernel’s LLMClient’s LLMClient’s LLM (via sampling)
Tools exposed32 orchestration toolsALL 90+ suite tools32 orchestration tools
Double API callsYes (client + kernel)NoNo
Needs API keyYesNoNo
ReAct/LTP loopKernel-drivenN/A (client drives)Kernel-driven
Best forAutonomous agentsClaude Desktop, Cursor, VS CodeNo-API-key deployments

MCP Sampling

The SamplingLLMGateway enables MCP Sampling passthrough — route LLM calls through the host application (VS Code, AWS Bedrock) instead of managing API keys directly.

kernel = KernelFactory.create(llm_gateway="sampling")

Tenant Isolation

workspacemcp supports per-tenant isolation for multi-tenant deployments:

workspace:
  tenant_isolation: true
  base_path: /data/tenants

Each tenant gets isolated file storage, checkpoints, and artifacts. Cross-tenant access is blocked.


Quick Start

3-line usage

from kernelmcp import KernelFactory

kernel = KernelFactory.from_env()
result = await kernel.run("analyze the auth module and suggest improvements")
print(result.summary)

Full suite (all 7 servers wired in-process)

from kernelmcp import KernelFactory

kernel = KernelFactory.full_suite(
    llm_model="claude-sonnet-4-6",
    api_key="sk-...",
    namespace="my_project",
)

result = await kernel.run(
    goal="migrate the database schema to v2",
    budget_usd=1.0,
    mode="hybrid",
)
print(f"Summary: {result.summary}")
print(f"Steps: {len(result.steps_taken)} | Cost: ${result.cost_usd:.4f}")

# Direct tool calls (no LLM)
await kernel.call_tool("schedule_task", {
    "goal": "daily backup", "job_type": "cron", "cron": "0 2 * * *"
})

Multi-Agent TaskForce

from kernelmcp import KernelFactory
from kernelmcp.agents.taskforce import TaskForce

kernel = KernelFactory.from_env()

# Launch a pre-built taskforce (templates: coding, research, writing, analysis)
tf = TaskForce.create("coding", goal="Build a payment API", registry=kernel._agent_registry)
result = await tf.run()
print(f"Agents used: {len(result.agent_results)} | Tokens: {result.total_tokens}")

MCP Server

from kernelmcp.factory import KernelFactory
from kernelmcp.mcp_server import KernelMCPServer

kernel = KernelFactory.from_env()
KernelMCPServer(kernel).run()

Or from the command line:

# Agent mode (default) -- kernel drives ReAct/LTP, exposes 32 tools
kernelmcp start

# Router mode -- client LLM drives, ALL 90+ suite tools exposed, zero double API calls
kernelmcp start --mode router

# Sampling mode -- agent mode but uses client's LLM via MCP sampling
kernelmcp start --mode agent --sampling

Claude Desktop claude_desktop_config.json (router mode recommended):

{
  "mcpServers": {
    "kernelmcp": {
      "command": "kernelmcp",
      "args": ["start", "--mode", "router"]
    }
  }
}

See MCP Server Modes for details on all three modes.


Features

Multi-Agent & Orchestration

  • :people_holding_hands: Multi-Agent TaskForce — compose agent teams with 5 patterns: sequential, parallel, supervisor, debate, swarm
  • :busts_in_silhouette: 7 Agent Types — code, research, file, memory, plan, custom, meta — each with focused tools and constitution
  • :jigsaw: SharedMemory + MessageBus — inter-agent collaboration with shared state and messaging
  • :shield: SandboxScope — per-agent isolation for safe concurrent execution
  • :arrows_counterclockwise: Handoff Protocol — structured context transfer between agents
  • :brain: MetaAgent — self-improving: analyzes failures, proposes constitution changes, generates LTP templates

Engine

  • :brain: ReAct + LTP Hybrid Engine — ReAct for exploratory tasks, LTP for structured execution, hybrid mode auto-selects
  • :rocket: LTP Compiler — Lean Task Protocol: compile once, execute deterministically. @PARALLEL, ON_FAIL, FOREACH, RE-PLAN, dot notation, type casting
  • :robot: Hybrid Router — routes simple tasks to local models, complex to cloud (cost optimization)
  • :scroll: Constitution — hardcoded PM persona with rules across 5 suite servers constraining the LLM
  • :wrench: Auto-Healing — detects execution failures and injects fix-retry prompts automatically

Observability & Debugging

  • :mag_right: Tracer — span-based tracing for every operation with optional OTel bridge
  • :bar_chart: Analytics — token efficiency, tool/agent performance, cost breakdown, bottleneck detection
  • :rewind: ReplayEngine — timeline, state_at, fork, compare for post-mortem debugging
  • :clipboard: Immutable Audit Trail — every LLM call and tool invocation logged with per-model cost tracking
  • :satellite: 26 Event Types — async event bus with subscribe/emit/stream
  • :ocean: TaskStream — real-time SSE streaming of task execution events

Persistence & Scaling

  • :floppy_disk: SQLite Checkpointing — auto-checkpoint before tasks, restore from any checkpoint
  • :chart_with_upwards_trend: BudgetScaler — max cost/tokens per hour, max concurrent agents
  • :whale: Docker Compose — production-ready deployment stack
  • :cloud: Kubernetes — manifests for cloud-native scaling

Resilience

  • :moneybag: Budget Enforcer — hard caps on tokens and cost per task and per namespace
  • :zap: LLM Fallback Chain — primary -> secondary -> tertiary model failover
  • :repeat: Retry with Backoff — exponential backoff on transient failures
  • :traffic_light: Rate Limiter — max tasks per minute per namespace
  • :electric_plug: Circuit Breaker — disables a server after N consecutive failures, auto-resets

Infrastructure

  • :link: Suite Orchestrator — wires all servers into a unified tool registry (120+ tools)
  • :gear: KernelFactorydefault() / create() / from_env() / from_yaml() / full_suite()
  • :desktop_computer: MCP Server (3 modes) — agent (kernel-driven ReAct), router (client-driven, 90+ suite tools, zero double API calls), sampling (agent + client LLM)
  • :keyboard: CLI — start, taskforce, templates, new, deploy, eval, improve, cost, and more
  • :alarm_clock: schedulermcp — cron, interval, and watch jobs for event-driven automation
  • :mag: Web Search — via websearchmcp (SearXNG, DuckDuckGo, Mojeek, Brave + Playwright browser rendering)
  • :globe_with_meridians: Dynamic MCP tools — connect any external MCP server at runtime (stdio/SSE)
  • :computer: Host Access — host_exec, host_file_read/write/copy/list with HostGuard security whitelist
  • :handshake: MCP Sampling — SamplingLLMGateway for passthrough to VS Code / Bedrock / future Claude Desktop
  • :lock: Tenant Isolation — workspacemcp per-tenant file, memory, and execution isolation

Knowledge & Memory

  • :brain: Self-RAG — retrieve, generate, self-critique, re-retrieve for verified factual answers
  • :zap: ReAct RAG — multi-step iterative reasoning with multiple searches for complex questions
  • :bar_chart: RAGAS Eval — 5-metric evaluation (context relevancy, precision, faithfulness, answer correctness)
  • :bust_in_silhouette: User Profiles — personalized search ranking based on user preferences
  • :shield: GDPR forget_user — permanently delete all data for a namespace
  • :file_folder: Folder & URL Ingest — ingest entire folders or download documents from URLs
  • :memo: Auto-Episodes — conversation turns automatically stored as episodic memory
  • :gear: Full Context Assembly — persona + working memory + episodes + facts + RAG docs injected at every task

Installation

# Core kernel (no suite libraries)
pip install mcpaisuite-kernelmcp

# With specific libraries
pip install "mcpaisuite-kernelmcp[memorymcp,planningmcp]"

# Full suite (all servers + webhooks + observability)
pip install "mcpaisuite-kernelmcp[all]"
# [all] pulls in: memorymcp, planningmcp, ragmcp, workspacemcp,
#                 sandboxmcp, schedulermcp + webhooks + REST API.
# (websearchmcp, ltpmcp, evalmcp are core deps — always installed.)

# Development
pip install -e ".[dev]"

Requirements: Python 3.11+


Constitution

The Constitution is a hardcoded system prompt that constrains every LLM call. It defines the PM persona and rule domains spanning all 5 suite servers:

DomainKey rules
PlanningALWAYS create a plan before executing. Never skip steps. Fix failures before advancing.
MemoryRelevant memories are injected at task start. Store important outcomes for future reference.
KnowledgeSearch ragmcp before guessing. Use document context to inform decisions.
WorkspaceRoute ALL code to write_file first. Create checkpoints before modifications.
ExecutionExecute through sandboxmcp only. Debug failures: read -> fix -> re-execute. Never hardcode secrets.
GeneralBe concise. Ask for clarification on ambiguity. Prefer local models for simple lookups.

The Constitution can be updated at runtime via the set_constitution MCP tool or programmatically:

kernel._engine._constitution.update_rules("Your custom rules here...")

Memory context and RAG context are injected into the system prompt dynamically at each turn.


Hybrid Router

The TaskSupervisor estimates task complexity (0.0 to 1.0) and routes to the optimal model:

ComplexityRangeModelUse case
Simple< 0.3ollama/mistral (local)Status checks, lookups, simple queries
Medium0.3 - 0.7claude-haiku-4-5 (fast)Summaries, explanations, searches
Complex> 0.7claude-sonnet-4-6 (cloud)Code generation, architecture, multi-step tasks

Complexity scoring uses keyword analysis, task length, multi-step indicators, and code-related terms. Routing can be disabled to always use the cloud model:

kernel = KernelFactory.create(enable_routing=False)

Auto-Healing

When a tool execution fails (specifically execute_code via sandboxmcp), the engine automatically injects a system prompt instructing the LLM to:

  1. Analyze the error from stderr
  2. Fix the code using workspacemcp.edit_file
  3. Re-execute using sandboxmcp.execute_code
  4. NOT advance the plan until exit_code == 0

This creates a self-correcting loop without human intervention. Auto-healing can be disabled:

kernel = KernelFactory.create(auto_heal=False)

Resilience

kernelmcp includes three resilience layers that protect against failures and runaway costs:

Budget Enforcer

Hard caps on tokens and cost per task. When exceeded, the task is marked as failed immediately.

kernel = KernelFactory.create(
    max_tokens_per_task=50000,   # token ceiling
    max_cost_per_task=1.0,       # dollar ceiling
)

Per-namespace spend tracking is available via BudgetEnforcer.get_spent(namespace).

Rate Limiter

Limits the number of tasks per minute per namespace (default: 30/min). Tasks that exceed the limit are rejected immediately.

Circuit Breaker

Disables a server after N consecutive failures (default: 3). Automatically resets after a cooldown period (default: 60 seconds). Prevents cascading failures when a suite library is down.

LLM Fallback Chain

If the primary LLM fails, the engine falls back through a chain of models:

claude-sonnet-4-6  -->  gpt-4o  -->  ollama/mistral

Retry with Backoff

Failed tool calls are retried with exponential backoff (base delay: 1s, max delay: 30s, default retries: 1).


Audit Trail

Every LLM call and tool invocation is logged as an immutable AuditEntry with:

  • Task ID, action type, model name, tool name
  • Token count and cost
  • Success/failure status
  • Namespace and timestamp

Two backends are available:

BackendUse case
InMemoryAuditLoggerDevelopment, testing
SQLiteAuditLoggerProduction — persistent, queryable

Per-model cost tracking

# Get cost breakdown by model
costs = await audit.cost_by_model(namespace="default")
# {"claude-sonnet-4-6": {"cost": 0.42, "tokens": 12500}, "ollama/mistral": {"cost": 0.0, "tokens": 800}}

Querying the audit trail

# Recent entries
entries = await audit.query(namespace="default", action="llm_call", limit=50)

# Total entry count
count = await audit.count(namespace="default")

MCP Tools

The full suite exposes 120+ tools across all servers. The kernel itself provides orchestration, observability, and persistence tools via the MCP protocol (stdio transport):

Orchestration Tools

ToolDescription
run_taskSubmit a task for autonomous execution
get_task_statusCheck task progress
list_tasksList all tasks
cancel_taskAbort a running task
get_turnsGet ReAct turn history
spawn_agentSpawn a specialized agent (code, research, file, memory, plan, custom, meta)
create_taskforceCreate a multi-agent taskforce
kernel_statsToken costs, latency, model usage
kernel_configView kernel configuration
set_constitutionUpdate PM rules
trigger_webhookFire a webhook event
healthKernel health + server connections

Observability Tools

ToolDescription
get_traceRetrieve the full trace for a task
get_analyticsGet analytics summary for a namespace
compare_runsCompare two task runs side by side

Persistence Tools

ToolDescription
checkpointCreate a checkpoint for a running task
restoreRestore a task from a checkpoint
list_checkpointsList available checkpoints

Agent & Background Tools

ToolDescription
list_agentsList available sub-agent types
run_backgroundRun a task asynchronously, returns an operation ID
get_operationCheck the status of a background operation
list_operationsList all background operations

Introspection Tools

ToolDescription
kernel_auditView kernel task audit log (completions, failures, costs)
improveAnalyze kernel performance and suggest improvements
list_taskforce_templatesList available TaskForce templates with agent configs
list_taskforce_examplesList pre-built taskforce configs (secure_coding, research_verify, enterprise)

CLI

kernelmcp start                                       # Start the kernel MCP server
kernelmcp start --mode router                         # Start in router mode
kernelmcp start --sampling                            # Use the client's LLM via MCP sampling
kernelmcp taskforce "Build REST API" --template coding   # Launch a multi-agent taskforce
kernelmcp taskforce "Analyze sales data" --pattern debate  # Custom pattern
kernelmcp templates                                   # List available taskforce templates
kernelmcp new coding "Build a REST API"               # Generate a TaskForce config file from a template
kernelmcp deploy taskforce.json                       # Deploy (run) a TaskForce from a JSON config file
kernelmcp eval --suite memory                         # Run evaluation benchmarks
kernelmcp improve --dry-run                           # Preview MetaAgent improvements
kernelmcp improve                                     # Apply MetaAgent improvements
kernelmcp cost                                        # Show token cost summary
kernelmcp run "migrate the database"                  # Execute a task interactively
kernelmcp agents                                      # List available agent types
kernelmcp status                                      # Show kernel health and connected servers
kernelmcp stop                                        # Stop the kernel daemon
kernelmcp logs --tail 50                              # View recent events
kernelmcp config                                      # View kernel configuration
kernelmcp servers                                     # List connected MCP servers and tool count

Events (26 types)

kernelmcp emits 26 event types through an async event bus. Subscribe to monitor task execution in real time.

EventEmitted when
task.startedA task begins execution
task.completedA task finishes successfully
task.failedA task fails (budget, max turns, error)
task.cancelledA task is cancelled
turn.startedA new ReAct turn begins
turn.completedA ReAct turn finishes
tool.calledA tool is about to be executed
tool.succeededA tool call returns successfully
tool.failedA tool call fails
plan.enforcedA plan step is enforced
auto_heal.triggeredAuto-healing activates after execution failure
context.bootstrappedMemory context is loaded at task start
context.trimmedContext window is trimmed to fit token limits
llm.calledAn LLM call is made
llm.deltaA streaming token/delta is emitted
llm.routedThe supervisor selects a model
webhook.receivedAn external webhook is received
taskforce.startedA TaskForce begins execution
taskforce.completedA TaskForce finishes successfully
taskforce.failedA TaskForce fails
agent.handoffOne agent hands off context to another
agent.messageAn agent-to-agent message is sent
from kernelmcp.events import kernel_event_bus

# Stream all events
async for event in kernel_event_bus.stream():
    print(f"{event.type.value}: {event.data}")

# Subscribe with a queue
queue = kernel_event_bus.subscribe()
event = await queue.get()

Factory

KernelFactory provides multiple construction methods:

from kernelmcp import KernelFactory

# Zero config -- in-memory, no libraries, just LLM + ReAct
kernel = KernelFactory.default()

# From environment variables (KERNELMCP_MODEL, ANTHROPIC_API_KEY, etc.)
kernel = KernelFactory.from_env()

# From a YAML configuration file
kernel = KernelFactory.from_yaml("kernel_config.yaml")

# Full configuration
kernel = KernelFactory.create(
    llm_model="claude-sonnet-4-6",
    local_model="ollama/mistral",
    fast_model="claude-haiku-4-5-20251001",
    api_key="sk-...",
    enable_routing=True,
    max_turns=20,
    max_tokens_per_task=50000,
    max_cost_per_task=1.0,
    auto_plan=True,
    auto_heal=True,
    auto_memory=True,
    jit=False,                     # Agent-JIT: reuse shadow-validated solution patterns
    nano=False,                    # fast path for trivial single-shot tasks
    namespace="my_project",
    memory_pipeline=memory,        # from memorymcp
    planning_pipeline=planning,    # from planningmcp
    rag_pipeline=rag,              # from ragmcp
    workspace_pipeline=workspace,  # from workspacemcp
    sandbox_pipeline=sandbox,      # from sandboxmcp
)

# Full suite -- all 7 servers auto-detected and wired
kernel = KernelFactory.full_suite(
    llm_model="claude-sonnet-4-6",
    api_key="sk-...",
    namespace="default",
)

Environment variables

VariableDefaultDescription
KERNELMCP_MODELclaude-sonnet-4-6Primary LLM model
KERNELMCP_LOCAL_MODELollama/mistralLocal model for simple tasks
KERNELMCP_ROUTINGtrueEnable hybrid routing
KERNELMCP_MAX_TURNS20Maximum ReAct turns per task
KERNELMCP_MAX_TOKENS50000Token budget per task
KERNELMCP_JITfalseReuse shadow-validated solution patterns across repeated task families (Agent-JIT)
KERNELMCP_NANOfalseFast path for trivial single-shot tasks
KERNELMCP_NAMESPACEdefaultDefault namespace
ANTHROPIC_API_KEYAnthropic API key
OPENAI_API_KEYOpenAI API key (fallback)

Development

git clone https://github.com/gashel01/kernelmcp
cd kernelmcp
pip install -e ".[dev]"

# Run tests
pytest tests/ -v                    # 1285+ tests

# With coverage
pytest tests/ --cov=kernelmcp --cov-report=html

Project structure

kernelmcp/
  core/
    models.py          -- Task, Turn, ToolCall, KernelConfig; AgentType (7 types: code, research, file, memory, plan, custom, meta)
    engine.py          -- ReAct engine (the autonomous execution loop)
    ltp_runner.py      -- LTP compiler + deterministic executor (@PARALLEL, ON_FAIL, FOREACH, RE-PLAN)
    constitution.py    -- PM persona and hardcoded LLM constraints
    context.py         -- Context window manager
    audit.py           -- InMemory + SQLite audit loggers
    resilience.py      -- Fallback chain, retry, rate limiter
    bootstrap.py / tool_executor.py / tool_selection.py / nudges.py / ...
  agents/
    base.py            -- Agent base + search_tools meta-tool
    meta_agent.py      -- MetaAgent (self-improving: failure analysis, constitution updates)
    patterns.py        -- 5 patterns: sequential, parallel, supervisor, debate, swarm
    taskforce.py       -- TaskForce orchestrator + templates
    shared_memory.py / message_bus.py / sandbox_scope.py / graph_executor.py / registry.py
    code_agent.py / research_agent.py / file_agent.py / memory_agent.py
  routing/
    supervisor.py      -- Hybrid router (complexity estimation + model selection)
    llm_gateway.py     -- LLM abstraction layer (litellm)
    sampling_gateway.py -- SamplingLLMGateway for MCP Sampling
  observability/
    tracer.py          -- Span-based tracing + analytics
    replay.py          -- ReplayEngine: timeline, state_at, fork, compare
    hub.py             -- connect_hub() control-plane connector
    otel.py            -- OpenTelemetry export (optional)
  integration/
    orchestrator.py    -- Suite orchestrator (wires all servers) + host tools (HostGuard via sandboxmcp)
    mcp_client.py      -- Dynamic MCP server connections
    langchain_adapter.py -- LangChain tools bridge
  enterprise/          -- RBACManager, CostAllocator, SLAMonitor
  multimodal/          -- VisionAnalyzer, AudioTranscriber
  persistence/         -- SQLite checkpointing and recovery
  a2a/                 -- A2AServer / A2AClient / A2ABridge
  mcp_client/          -- MCP client utilities
  triggers/            -- Trigger / scheduling support
  api/                 -- FastAPI app (optional)
  scaling.py           -- BudgetScaler: cost/token/concurrency limits
  streaming.py         -- TaskStream with SSE events
  pipeline.py          -- KernelPipeline, BudgetEnforcer, CircuitBreaker
  events.py            -- Event bus (26 event types)
  factory.py           -- KernelFactory
  facade.py            -- High-level facade
  mcp_server.py        -- MCP server (32 tools)
  cli.py               -- CLI commands

Ecosystem & Interoperability

kernelmcp is the only AI agent framework that natively connects to three ecosystems:

Dynamic MCP Client

Connect to any of the 2000+ MCP servers at runtime:

await kernel.orchestrator.connect_mcp_server("github", transport="stdio", command="github-mcp-server")
await kernel.orchestrator.connect_mcp_server("slack", transport="sse", url="http://localhost:8080/sse")
# Tools appear automatically — the LLM can use them immediately

LangChain Tool Bridge

Use any of LangChain’s 500+ tools without leaving kernelmcp:

from langchain_community.tools import WikipediaQueryRun
kernel.orchestrator.register_langchain_tool(WikipediaQueryRun())
# lc__wikipedia is now available to the engine

License

AGPL-3.0 — see LICENSE.

For commercial licensing (closed-source usage), contact the author.