sandboxmcp

📦 23 tools

:shield: sandboxmcp — Secure, polyglot code execution for AI agents

Part of the MCP AI suite: ragmcp · memorymcp · planningmcp · workspacemcp · sandboxmcp

Philosophy: “Isolation is the Law, Execution is a Privilege.”

What is sandboxmcp?

sandboxmcp is a zero-trust code execution engine that lets AI agents run Python, Node.js, and Shell code in isolated sandboxes with full security controls:

Polyglot execution — Python, Node.js, and Shell in isolated subprocesses or Docker containers
2 backends — Process (subprocess + OS resource limits) and Docker (full container isolation)
Network deny-all — All network access is blocked by default; domains must be explicitly approved per-request
Secret vault — Inject environment variables securely; secrets are automatically redacted from all output
Artifact signing — Every output file is SHA-256 signed for tamper-proof verification
Resource guard — CPU, RAM, process count, and timeout limits enforced via OS primitives (rlimit / Windows Job Objects)
Async job queue — Backpressure-aware concurrency control prevents host OOM from parallel spawns
Immutable audit log — Every execution, package install, session, and egress event is recorded (InMemory or SQLite)
Code validation — Syntax checking and auto-fix before execution
Host access — Guarded host command execution with approval workflow
Web tools — Web search, page fetching, and browser rendering (lazily delegated to websearchmcp; requires pip install "mcpaisuite-sandboxmcp[browser]")
MCP server — 23 tools, stdio transport, compatible with Claude Desktop, Cursor, or any MCP client

Execution Flow

ExecutionRequest ↓

Job Queuebackpressure

Vaultinject env

Network Guarddeny-all / allow

Process / Dockerbackend execution

Artifact SignerSHA-256 hash

Secret Maskredact values

Audit Logimmutable

↓ SandboxResult

Every execution passes through the full zero-trust pipeline: Queue -> Vault -> Network -> Execute -> Sign -> Mask -> Audit.

Quick Start

3-line usage

from sandboxmcp import SandboxFactory, ExecutionRequest

sandbox = SandboxFactory.default()
result = await sandbox.execute(ExecutionRequest(code="print('hello sandbox')"))
print(result.stdout)  # hello sandbox

Multi-language execution

from sandboxmcp import SandboxFactory, ExecutionRequest, Language

sandbox = SandboxFactory.default()

# Python
result = await sandbox.execute(ExecutionRequest(code="print(2 + 2)", language=Language.python))

# Node.js
result = await sandbox.execute(ExecutionRequest(code="console.log(2 + 2)", language=Language.node))

# Shell
result = await sandbox.execute(ExecutionRequest(code="echo $((2 + 2))", language=Language.shell))

Docker backend

from sandboxmcp import SandboxFactory, ExecutionRequest

# Full container isolation with Docker
sandbox = SandboxFactory.create(
    default_backend="docker",
    memory_limit="256m",
    cpu_period=100000,
    cpu_quota=50000,
    network_mode="none",
)

result = await sandbox.execute(ExecutionRequest(code="print('isolated')"))

The Docker backend uses python:3.11-slim for Python, node:20-slim for Node.js, and ubuntu:22.04 for Shell. Install the optional dependency with pip install "mcpaisuite-sandboxmcp[docker]".

Process backend hardening (Linux, opt-in)

The plain process backend has no kernel isolation — it stops egress and resource abuse, but a process can still read and write host files (this is why the benchmark scores it 3/5, measured on the default backend). On Linux you can close that gap without Docker:

from sandboxmcp.backends.process_rt import ProcessBackend

# Wrap each run in Landlock + user/network namespaces (Linux only)
backend = ProcessBackend(hardened=True)

When hardened=True, each run is wrapped in:

Landlock (LSM, kernel ≥ 5.13) — restricts the process to read-only access to the system filesystem, so host file writes are denied at the kernel.
User + network namespaces (rootless) — when network is disallowed, the run drops into a fresh empty network namespace, making egress impossible at the kernel level (stronger than the userspace block).

This is honest about its limits: on a non-Linux host or a kernel without Landlock, supported() returns False and the backend runs unhardened and says so — it never reports a containment it didn’t actually apply. For untrusted code, Docker remains the backend that contains all five host-impact escapes (5/5).

MCP server

from sandboxmcp import SandboxFactory
from sandboxmcp.mcp_server import SandboxMCPServer

sandbox = SandboxFactory.default()
SandboxMCPServer(sandbox).run()

Or from the command line:

sandboxmcp serve

Claude Desktop claude_desktop_config.json:

{
  "mcpServers": {
    "sandboxmcp": {
      "command": "sandboxmcp",
      "args": ["serve"]
    }
  }
}

Features

:shield: Zero-trust pipeline — Queue -> Vault -> Network -> Execute -> Sign -> Mask -> Audit
:snake: Polyglot — Python, Node.js, and Shell execution in isolated subprocesses or Docker containers
:lock: Network deny-all — All network blocked by default; DNS/socket/proxy-level enforcement
:key: Secret vault — InMemory or EnvVar vault; secrets auto-redacted from stdout/stderr
:pencil2: Artifact signing — SHA-256 hash on every output file; verify integrity anytime
:bar_chart: Resource guard — CPU, RAM, process count, timeout enforced via rlimit (Linux/macOS) or Job Objects (Windows)
:penguin: Process backend hardening (Linux, opt-in) — ProcessBackend(hardened=True) adds Landlock (read-only system FS) + user/network namespaces for kernel-level isolation without Docker; transparently runs unhardened (and reports it) where unsupported
:hourglass_flowing_sand: Async job queue — Semaphore-based backpressure; configurable max concurrency
:scroll: Immutable audit log — InMemory or SQLite; every action recorded with timestamp and detail
:electric_plug: MCP server — 23 tools, stdio transport, compatible with any MCP client
:computer: CLI — 23 commands for execution, sessions, vault, audit, validation, web, and host access
:gear: Stateful sessions — Create, track, and kill long-running sandbox sessions
:bridge_at_night: Cross-language bridge — Pass JSON data between Python and Node.js sessions
:satellite: 14 event types — Subscribe to execution, session, egress, vault, and queue events
:factory: SandboxFactory — default() / create() / from_env() / from_yaml()

Installation

# Minimal — subprocess backend, no external services
pip install mcpaisuite-sandboxmcp

# With Docker backend
pip install "mcpaisuite-sandboxmcp[docker]"

# With web tools (web_search / fetch_webpage / browser_fetch)
pip install "mcpaisuite-sandboxmcp[browser]"

# With suite integrations
pip install "mcpaisuite-sandboxmcp[workspacemcp]"
pip install "mcpaisuite-sandboxmcp[planningmcp]"

# Full stack (Docker + browser + suite integrations)
pip install "mcpaisuite-sandboxmcp[all]"

Requirements: Python 3.11+

Security

Secret Vault

sandboxmcp provides two vault implementations for injecting secrets into sandbox processes without exposing them to the LLM:

InMemoryVault — Namespace-isolated in-memory secret store. Secrets are automatically masked in all output with ***REDACTED***.

EnvVault — Reads secrets from OS environment variables with a namespace prefix (SANDBOXMCP_SECRET_{NAMESPACE}_{KEY}).

# Store a secret
await sandbox.vault_add("default", "API_KEY", "sk-1234567890")

# Code can access it via env var, but the value is redacted in output
result = await sandbox.execute(ExecutionRequest(
    code="import os; print(os.environ.get('API_KEY'))"
))
# stdout: ***REDACTED***

Network Egress Guard

All network access is denied by default. The NetworkGuard enforces this at three levels:

Proxy blocking — http_proxy / https_proxy set to 0.0.0.0:0
Socket/DNS patching — Python socket.connect, Node.js dns.lookup and net.connect are overridden
Shell aliasing — curl and wget aliased to blocked messages

To allow specific domains, pass allowed_domains in the ExecutionRequest. Human-in-the-loop approval is available via request_egress:

# Per-request allowlist
result = await sandbox.execute(ExecutionRequest(
    code="import urllib.request; print(urllib.request.urlopen('https://api.example.com').read())",
    allowed_domains=["api.example.com"],
))

# Human-in-the-loop approval (blocks until approved/denied/timeout).
# request_egress(domain, namespace) takes positional args.
from sandboxmcp.security.egress import NetworkGuard

network_guard = NetworkGuard(enable_network=True)
approved = await network_guard.request_egress("api.example.com", "default")

# An approval channel (UI/callback) resolves the pending request:
network_guard.approve_egress("default", "api.example.com")  # -> request_egress returns True

Resource Guard

OS-level enforcement of resource limits per execution:

Limit	Default	Hard Cap	Mechanism
RAM	512 MB	8 GB	`RLIMIT_DATA` (Linux/macOS) / Job Objects (Windows)
CPU time	60 s	3600 s	`RLIMIT_CPU` (Linux/macOS) / `asyncio.wait_for`
Processes	10	—	`RLIMIT_NPROC` (Linux) / Job Objects (Windows)
Output	1 MB	—	Truncation after decode

from sandboxmcp import ExecutionRequest, ResourceLimits

result = await sandbox.execute(ExecutionRequest(
    code="print('constrained')",
    resource_limits=ResourceLimits(
        max_ram_mb=256,
        max_cpu_cores=1,
        timeout_seconds=30,
        max_processes=5,
        max_output_bytes=500_000,
    ),
))

Artifact Signing

Every output file produced by sandbox execution is automatically hashed with SHA-256. Artifacts include the hash, size, and base64-encoded content for tamper-proof verification:

from sandboxmcp.security.crypto import ArtifactSigner

# Sign manually
artifact = ArtifactSigner.sign("report.csv", data_bytes)

# Verify integrity
assert ArtifactSigner.verify(artifact, artifact.sha256)

# Generate a manifest for multiple artifacts
manifest = ArtifactSigner.sign_result_artifacts(result.artifacts)
# {"report.csv": "a1b2c3...", "chart.png": "d4e5f6..."}

MCP Tools

sandboxmcp exposes 23 tools via the MCP protocol (stdio transport):

Execution

Tool	Description
`execute_code`	Run code in isolated sandbox (Python, Node.js, or Shell)
`install_package`	Install pip/npm packages safely
`validate_code`	Validate code before execution: syntax check, auto-fix imports, detect dangerous patterns

Sessions

Tool	Description
`create_session`	Start a stateful sandbox session for multi-step execution
`terminate_session`	Kill a sandbox session and wipe state
`bridge_data`	Pass JSON data between Python and Node.js sessions

Configuration and Monitoring

Tool	Description
`list_runtimes`	List available languages and backends
`set_resource_limits`	Update CPU/RAM/timeout limits
`sandbox_stats`	Live telemetry: queue, sessions, resource usage
`inspect_state`	View env vars (secrets masked) and active sessions
`get_artifacts`	Retrieve output files with SHA-256 signatures

Security and Audit

Tool	Description
`request_egress`	Request human approval to open a network domain
`audit_execution`	Get immutable execution audit log

Web Tools

Tool	Description
`web_search`	Search the web using SearXNG, Yandex, Mojeek, or DuckDuckGo rotation
`fetch_webpage`	Fetch a URL and extract clean readable content as markdown
`browser_fetch`	Fetch a URL with full JS rendering via headless Chromium (Playwright)

Note: Web tools (web_search, fetch_webpage, browser_fetch) are provided by websearchmcp, which is lazily imported at call time. They require the optional browser extra — install with pip install "mcpaisuite-sandboxmcp[browser]" (or [all]). Without it, these tools raise an import error while the rest of sandboxmcp works normally.

Host Access

Tool	Description
`host_exec`	Execute a command on the host system (subject to security whitelist)
`request_host_access`	Request approval to run a command pattern on the host
`list_host_access`	List allowed, approved, and blocked host command patterns
`host_file_read`	Read a file from the host machine filesystem
`host_file_write`	Write content to a file on the host machine filesystem
`host_file_list`	List files in a directory on the host machine filesystem
`host_file_copy`	Copy a file from the host machine into the workspace

CLI

sandboxmcp provides 23 CLI commands with full parity to the 23 MCP tools:

# Start MCP server (stdio transport)
sandboxmcp serve
sandboxmcp serve --transport sse --port 8080
sandboxmcp serve --config sandbox_config.yaml

# Execute a script file (auto-detects language from extension)
sandboxmcp run script.py
sandboxmcp run app.js --language node --timeout 120
sandboxmcp run deploy.sh --backend docker

# View job queue status
sandboxmcp queue status

# Manage sandbox sessions
sandboxmcp session list
sandboxmcp session kill --id <session_id>

# View configuration
sandboxmcp config

# Manage secret vault
sandboxmcp vault add MY_KEY my_value --namespace prod
sandboxmcp vault list --namespace prod
sandboxmcp vault delete MY_KEY --namespace prod

# Show execution stats
sandboxmcp stats

# Verify artifact integrity
sandboxmcp verify output.csv a1b2c3d4e5f6...

# Validate code without execution
sandboxmcp validate script.py
sandboxmcp validate app.js --language node --no-auto-fix

# View audit log
sandboxmcp audit --namespace default --limit 100

# Web search and fetching
sandboxmcp web search "python asyncio tutorial"
sandboxmcp web fetch https://example.com
sandboxmcp web browser https://example.com --wait-for ".content" --screenshot

# Host system access
sandboxmcp host exec "docker ps"
sandboxmcp host read /etc/hostname
sandboxmcp host write /tmp/note.txt "hello"
sandboxmcp host list /var/log
sandboxmcp host copy /host/data.csv workspace_data.csv
sandboxmcp host access list
sandboxmcp host access request "docker restart *"

# Inspect sandbox session state
sandboxmcp inspect default --namespace prod

# Get execution artifacts
sandboxmcp artifacts <request_id>

# Bridge data into a session
sandboxmcp bridge <session_id> key '{"value": 42}'

# Set resource limits
sandboxmcp limits --ram 1024 --cpu 2 --timeout 120 --processes 20

# Approve a domain for network egress
sandboxmcp egress api.example.com --namespace default

SandboxFactory

from sandboxmcp import SandboxFactory

# Zero config — process backend, in-memory vault, no network
sandbox = SandboxFactory.default()

# Read config from environment variables
sandbox = SandboxFactory.from_env()

# Read config from a YAML file
sandbox = SandboxFactory.from_yaml("sandbox_config.yaml")

# Fully configurable
sandbox = SandboxFactory.create(
    default_backend="process",       # "process" | "docker"
    max_concurrent_jobs=4,
    enable_network=False,
    allowed_domains=["api.example.com"],
    vault="memory",                  # "memory" | "env"
    audit="sqlite",                  # "memory" | "sqlite"
    max_ram_mb=512,
    max_cpu_cores=1,
    timeout_seconds=60,
    max_processes=10,
    sqlite_path="sandboxmcp.db",
)

Environment variables

Variable	Default	Description
`SANDBOXMCP_BACKEND`	`process`	Execution backend (`process` / `docker`)
`SANDBOXMCP_MAX_CONCURRENT`	`4`	Max concurrent jobs
`SANDBOXMCP_NETWORK`	`false`	Enable network access
`SANDBOXMCP_VAULT`	`memory`	Vault backend (`memory` / `env`)
`SANDBOXMCP_AUDIT`	`sqlite`	Audit backend (`memory` / `sqlite`)
`SANDBOXMCP_MAX_RAM_MB`	`512`	Default RAM limit per execution
`SANDBOXMCP_TIMEOUT`	`60`	Default timeout in seconds
`SANDBOXMCP_SQLITE_PATH`	`sandboxmcp.db`	SQLite database path

Events

sandboxmcp emits 14 event types through an async event bus:

Event	Trigger
`execution.started`	Code execution begins
`execution.completed`	Code execution succeeds
`execution.failed`	Code execution fails (non-zero exit)
`execution.timeout`	Execution killed after timeout
`session.created`	New sandbox session created
`session.killed`	Session terminated
`package.installed`	pip/npm package installed
`egress.requested`	Network domain access requested
`egress.approved`	Domain access approved
`egress.denied`	Domain access denied
`vault.accessed`	Secret vault read
`resource.exceeded`	Resource limit breached
`artifact.signed`	Artifact SHA-256 signed
`queue.backpressure`	Job queue at capacity

from sandboxmcp.events import sandbox_event_bus, SandboxEventType

# Subscribe to all events for a namespace
queue = sandbox_event_bus.subscribe(namespace="default")

# Subscribe to all events globally
queue = sandbox_event_bus.subscribe()

# Stream events asynchronously
async for event in sandbox_event_bus.stream(namespace="default"):
    print(f"{event.type}: {event.message}")

Integration with the MCP suite

sandboxmcp is designed to work alongside the other four libraries in the MCP AI suite:

Library	Integration
ragmcp	Execute code that queries RAG pipelines; use sandbox to run generated data-processing scripts
memorymcp	Store execution results as facts; remember which code patterns succeeded or failed
planningmcp	Plan multi-step execution workflows; sandbox each step with resource limits
workspacemcp	Read/write workspace files, then execute them in sandboxmcp for safe testing

All five libraries expose MCP servers on stdio, so they can run side-by-side in Claude Desktop or any MCP-compatible agent:

{
  "mcpServers": {
    "sandboxmcp": { "command": "sandboxmcp", "args": ["serve"] },
    "memorymcp": { "command": "memorymcp", "args": ["serve"] },
    "ragmcp": { "command": "ragmcp", "args": ["serve"] },
    "planningmcp": { "command": "planningmcp", "args": ["serve"] },
    "workspacemcp": { "command": "workspacemcp", "args": ["serve"] }
  }
}

Development / Contributing

git clone https://github.com/gashel01/sandboxmcp
cd sandboxmcp
pip install -e ".[dev]"

# Unit tests — no external services needed
pytest tests/unit/ -v                            # 398 tests

# With coverage
pytest tests/unit/ --cov=sandboxmcp --cov-report=html

Project structure:

sandboxmcp/
  core/
    models.py        — Pydantic models (ExecutionRequest, SandboxResult, Session, Artifact, AuditEntry)
    base.py          — Abstract base classes (BaseBackend, BaseVault, BaseNetworkGuard, BaseAuditLogger)
  backends/
    process_rt.py    — Process backend (subprocess + resource guard + network blocking)
    docker_rt.py     — Docker backend (container isolation + resource limits)
  runtimes/
    python_env.py    — Python runtime (pip install)
    node_env.py      — Node.js runtime (npm install)
    shell_env.py     — Shell runtime (bash -r restricted mode)
  security/
    vault.py         — InMemoryVault + EnvVault (secret injection + output masking)
    egress.py        — NetworkGuard (DNS allowlist, default deny-all, human-in-the-loop approval)
    resource.py      — ResourceGuard (rlimit on Linux/macOS, Job Objects on Windows)
    crypto.py        — ArtifactSigner (SHA-256 signing + verification)
    audit.py         — InMemoryAuditLogger + SQLiteAuditLogger (immutable execution log)
    validator.py     — CodeValidator (syntax check, auto-fix, dangerous pattern detection)
    host_guard.py    — HostGuard (host command execution with approval workflow)
  pipeline/
    manager.py       — SandboxPipeline (central orchestrator)
    queue.py         — AsyncJobQueue (semaphore-based backpressure)
  utils/
    web_extractor.py — WebExtractor (HTML to markdown content extraction)
    browser_fetch.py — Browser rendering via Playwright (headless Chromium)
  integration/       — Suite integration helpers
  events.py          — Event bus (14 event types, subscribe/emit/stream)
  mcp_server.py      — MCP server (23 tools, stdio transport)
  factory.py         — SandboxFactory (default / create / from_env / from_yaml)
  cli.py             — CLI (23 commands: serve, run, queue, session, config, vault, stats, verify, validate, audit, web {search,fetch,browser}, host {exec,read,write,list,copy,access}, inspect, artifacts, bridge, limits, egress)

Contributions are welcome. Please open an issue before submitting a large PR.

License

Apache-2.0 — see LICENSE.

For commercial licensing (closed-source usage), contact the author.