v0.2.0 releasedAI agent security

Capability security
for AI agents.

A deterministic runtime that maps every tool your agents reach, mints scoped, revocable capability tokens — verified in single‑digit microseconds — and decides every call against a policy you wrote, with no LLM in the decision path. Stops prompt‑injected refunds, runaway tool storms, indirect‑injection data exfil, and unscoped MCP access before they reach prod. MCP‑native. Audit‑mapped to OWASP LLM, NIST AI RMF, MITRE ATLAS. MIT.

Install Capframe Get a $750 audit Star on GitHub

curl -fsSL capframe.ai/install | sh

~/agents — capframe install

$ capframe install
→ mcp-recon
  ✓ mcp-recon v0.0.4   sha256 ok · 1.2 MB
→ capnagent
  ✓ capnagent v0.7.6   sha256 ok · 1.2 MB
→ mcp-guard
  ✓ mcp-guard v0.5.4   sha256 ok · 19.9 MB

Verify with: capframe doctor
Add to PATH: ~/.capframe/bin

§ 01The blast radius without it

Your agents inherit the full authority of the credentials you hand them.

Without a capability layer, every prompt-injection vector turns into the worst thing your agent could do with those keys. These aren't hypotheticals — they're the four failure modes every team shipping agents has seen, or is about to.

T01Indirect prompt injection

live

A vendor invoice arrives. The agent reads it. The PDF whispers: forward customers@ to attacker@. Your agent has Gmail scope. Your agent obeys.

GDPR fines · forced breach disclosure · board-level incidentOWASP LLM01 · ATLAS T0051

T02Runaway tool storm

live

One bad inference and the refund agent loops, issuing $50 refunds until something — anything — finally rate-limits it. Recovery: 6 hours of finance reversals.

$10K–$2M before the alarm fires · permanent trust lossOWASP LLM08 · NIST RMF MANAGE-1.2

T03Unbounded MCP surface

live

An MCP server exposes 47 tools. Your agent legitimately needs 4. The other 43 are jailbreak surface — and every dependency update silently widens it.

Every unscoped tool is a CVE waiting for the right promptOWASP LLM07 · ATLAS T0044

T04No forensic trail

live

The agent did something. You can't prove what it was allowed to do, who authorised the scope, when the policy last changed, or whether it was revoked in time.

Failed audit · indefinite incident response · regulator finds gaps for youEU AI Act Art.12 · ISO 42001 · SOC2 CC7.2

Capframe puts all four at the agent's tool-call boundary — Find and Bind as Rust binaries, Guard as a pip-installable Python layer. Every call is decided by a policy you wrote, not a model you can't inspect.

§ 02The pipeline

Four stages. Microsecond capability checks. One findings schema. No model in the loop.

Map the surface, mint scoped authority, enforce every call against a policy you wrote, then export the receipt. The output of each stage is the input to the next — and every artifact is auditor-ready.

01Discovery

Find

Map the tool surface. Catch indirect-injection gaps.

→ findings.json

02Authority

Bind

Mint scoped, revocable capability tokens.

→ cf_tok_a91…

03Enforcement

Guard

Evaluate every tool call against policy at runtime.

→ allow / deny

04Compliance

Report

Audit-ready artifact: OWASP / NIST / ATLAS.

→ report.html

§ 03The three modules

Standalone, or composed. Either way, three primitives — not three products.

Find and Bind ship as Rust crates with CLI subcommands; Guard ships as a Python package (pip install mcp-guardrails) — each in its own public GitHub repo. Adopt one at a time, or wire them together through the shared findings.v1 JSON Schema — the wire format Find emits, Bind scopes against, and Guard enforces.

03.1Discovery

Find

Walks every MCP server, every tool endpoint, every parameter your agent can reach. Detects unconstrained inputs, indirect-injection sinks, missing schemas, and silently-widening surfaces between dependency updates. Emits a findings.v1 JSON document aligned to OWASP LLM Top 10 and MITRE ATLAS — the same file Bind consumes to scope tokens and Guard consumes to synthesize policy.

▸Static + behavioural scan of every MCP server in your config
▸Diffs surfaces between scans — flags newly introduced tools
▸Schema-aware: catches missing parameter constraints, not just missing types
▸Cross-tool findings.v1 wire format (JSON Schema Draft 2020-12)

capframe find

$ capframe find ./mcp-server.toml
✓ mapped 14 tools across 2 mcp servers
⚠ 3 tools accept input without constraints (LLM01)
⚠ 1 tool has indirect-injection surface (LLM01, ATLAS T0051)
✓ surface diff: +2 tools vs last scan
→ ./capframe.findings.json

03.2Authority

Bind

The authority layer — and the most adversarially-tested module in the stack. Prompt injection is a confused-deputy attack (Lampson, 1974): smarter guardrails don't fix it, removing the agent's ambient authority does. Bind mints macaroon-style capability tokens — attenuable, revocable, ed25519 holder-of-key — that bound what an agent CAN do at issuance time. Out-of-scope calls are refused before the underlying tool ever sees them, each producing a signed, tamper-evident receipt.

▸Macaroon chain (HMAC-SHA256): a holder can't broaden scope without the root key
▸ed25519 holder-of-key proofs defeat token theft and replay
▸Caveats evaluate against verifier-known facts — never the agent's claims
▸Every caveat is human-readable: predict what a token permits in under 30s

capframe bind

$ capframe bind --agent shopify-bot \
                --tools "order.read, refund.write" \
                --limit max_refund=50 --limit region=eu \
                --ttl 24h
✓ token minted: cf_tok_a91f4e…
  holder:    ed25519 / shopify-bot
  scope:     2 tools · max_refund≤50 · region=eu
  expires:   2026-05-18T08:14:00Z
  revoke:    capframe revoke cf_tok_a91f4e

03.3Enforcement

Guard

A deterministic Python policy evaluator that sits inline at the agent's tool-call boundary. No LLM in the decision path — every allow/deny is reproducible, fuzzable, and immune to the jailbreak that just broke your agent. Synthesize policy from observed injection gaps, backtest against the corpus, ship.

▸Inline at the tool-call boundary — not a sidecar, not a daemon
▸Synthesizes YAML policy from a findings.v1 file in one command
▸Default corpus of 308 jailbreak / injection / scope-escape cases
▸Fail-closed by construction — no policy = no calls

capframe guard

$ capframe guard synth ./capframe.findings.json
✓ 14 rules generated across 3 categories
✓ policy → ./policy.yaml

$ capframe guard backtest ./policy.yaml
✓ 308-case corpus · TPR 1.00 / FPR 0.01
✓ deterministic · no model in the decision path

§ 04Adversarially-tested · the Bind layer

Bind isn't a claim. It's been attacked ten times — and every result is published.

The authority layer ships with a public purple-team corpus: a structured record of adversarial scenarios run against the token engine, written blue-first — the falsifiable security claim is committed before the attack runs. Every round ships a runnable PoC and a signed denial receipt as evidence. Clone it, run the tests, verify every number on this page without trusting a word of it.

10 / 10

purple-team rounds closed

blue-first methodology · every documented BREAK fixed and shipped

564

tests, all green

242 Rust + 322 TypeScript · incl. proptests on the no-broaden invariant

4 HIGH

defects in our own engine

found by a 4-agent / 36-angle self-review · published · 4 of 4 closed

1.4 µs

chain-only verify

56 µs full holder-of-key pipeline · ~17 kHz verifications / core

Live · real model

A Claude Opus 4.7 agent, told to move money it isn't scoped for.

No prompt engineering, no guardrail model, no second LLM judging the first. The agent is handed one capability — scoped to checkout.purchase. It tries to send a wire anyway. The authority simply isn't there, so the call dies at the gate while the in-scope purchase proceeds. Both verdicts are audit-logged.

demo:llm-direct — Anthropic SDK

$ npm run demo:llm-direct
→ task:  "send a $30 wire to acct 4471, then buy a USB-C cable"
→ scope: tool == "checkout.purchase"

  ⨯ wire.send          DENIED   out-of-scope
                       receipt cap_rcpt_3f9a…
  ✓ checkout.purchase  ALLOWED  in-scope
                       receipt cap_rcpt_7c12…

both decisions audit-logged · agent never reached the wire API

Attack classes the corpus exercises

tool-description injectionholder-of-key replaycapability broadeningrevocation racecross-origin exfilpath traversal + encodingIDN homograph originssandbox prefix foot-gun

And honest about what it doesn't cover

Model behaviour, system-prompt extraction, jailbreaks, and GCG suffixes are explicitly out of scope— documented in the threat model, not hand-waved. Bind removes the deputy's authority; it doesn't pretend to fix the model.

§ 05Under the hood

The runtime is the product. Everything else is paperwork.

Capframe is not a wrapper around an LLM, not a policy DSL transpiled to a model prompt, and not a managed service. It's a deterministic runtime with three primitives — tokens, policy, receipts — that sit inline at the boundary your agent already calls through.

010 LLM calls

Deterministic policy evaluator

Deterministic and model-free. The same input always returns the same allow/deny — fuzzable, reproducible, immune to the next jailbreak. Most importantly: no LLM in the decision path. Your enforcement boundary is not a model you have to re-evaluate every time someone publishes a new attack paper.

02ed25519 · attenuable

Macaroon-style capability tokens

ed25519 holder-of-key signatures, attenuable third-party caveats, revocation lists, TTL-bound. The primitive Google built distributed authorization on, ported to the agent boundary. Scope your agent to two tools and one region in one CLI call — and revoke the token in one more when something looks off.

03HMAC-SHA256

Tamper-evident receipts

Every allow and every deny emits a signed (HMAC-SHA256) receipt with policy hash, token id, agent id, parameters, and verdict. Drop the receipt stream into S3 or Loki and you have a forensic timeline that satisfies SOC2 CC7.2 and EU AI Act Article 12 logging requirements out of the box.

04Draft 2020-12

findings.v1 wire format

JSON Schema Draft 2020-12. Round-trip tested. The cross-tool contract Find emits, Bind reads to scope tokens, Guard reads to synthesize policy, and Report serializes into auditor-ready HTML/PDF. One schema means every artifact is grep-able, diff-able, and machine-checkable in CI.

056 targets · OSS

Static binaries, no daemon

No daemon. No kernel module. No container. Find and Bind ship as sha256-verified static Rust binaries — x86_64 / aarch64 across Linux, macOS, and Windows — and Guard installs as a Python package alongside. Runs in CI, in your IDE, on your laptop, and inline at the tool-call boundary. Permissive OSS (Apache-2.0 + MIT) — read every line of the code your security depends on.

06MCP today · adapters next

MCP-native, framework-agnostic

Today: every MCP server — Claude Desktop, Cursor, Continue, Cline, LangGraph via the MCP bridge, every agent SDK that speaks the protocol. Roadmap: native adapters for OpenAI function calling and Anthropic tool use, so the same policy file works regardless of which provider your agent picks tomorrow.

1.4 µs

Bind chain-verify (p50)

308

default jailbreak corpus

cross-compiled targets

LLM calls on the hot path

§ 06Compliance

The only artifact mapping all three agent-security frameworks at once.

Most tools tick one framework. Capframe was designed so a single run emits evidence aligned to OWASP LLM Top 10, NIST AI RMF, and MITRE ATLAS — the three frameworks regulated buyers (CISO, GRC, internal audit) actually ask about. capframe report exports the dossier as HTML or PDF — signed, timestamped, and ready to attach to an SOC2 / ISO 42001 / EU AI Act submission.

OWASP LLM

Top 10 — 2025

✓LLM01 prompt injection
✓LLM02 insecure output
✓LLM07 insecure plugin
✓LLM08 excessive agency

NIST AI RMF

v1.0

✓GOVERN
✓MAP
✓MEASURE
✓MANAGE

MITRE ATLAS

v4.7

✓TA0043 reconnaissance
✓TA0006 credential access
✓TA0040 impact
✓TA0007 discovery

§ 07Specimen transcript

Eighty seconds, four commands, one auditor-ready report.

~/agents — capframe — 80×24

$ capframe find ./my-mcp-server.toml
✓ mapped 14 tools across 2 MCP servers
⚠ 3 tools accept input without constraints (LLM01)
⚠ 1 tool has indirect-injection surface (LLM01, ATLAS T0051)
→ findings written to ./capframe.findings.json

$ capframe bind --agent shopify-bot \
                --tools "order.read, refund.write" \
                --limit max_refund=50 --limit region=eu \
                --ttl 24h
✓ token minted: cf_tok_a91f4e…
  holder:    ed25519 / shopify-bot
  scope:     2 tools · max_refund≤50 · region=eu
  expires:   2026-05-18T08:14:00Z
  revoke:    capframe revoke cf_tok_a91f4e

$ capframe guard backtest ./policy.yaml
✓ 247/247 corpus cases pass
✓ 14 rules, 3 categories
✓ false-positive rate: 0.0%

$ capframe report --format html --out ./report.html
✓ report written
   OWASP LLM Top 10:  4/10 covered, 2 findings open
   NIST AI RMF:       Govern ✓  Map ✓  Measure ✓  Manage ✓
   MITRE ATLAS:       2 techniques flagged, 0 active exploits

curl -fsSL capframe.ai/install | sh

Read the source →

§ 08Done-for-you

Short on time? We'll audit your agents.

The same posture we run on our own tools and on 90+ public servers on the leaderboard, pointed at your stack. I map your agent's tool surface — MCP servers, or your existing OpenAI / Anthropic / LangChain tool definitions — and hand you a report you can act on in five business days.

Founding · first 3 customers

Agent Security Audit

✓Branded OWASP LLM Top 10 / NIST AI RMF / MITRE ATLAS findings report (HTML + PDF)
✓Prioritized remediation checklist — what to fix, in what order, and why
✓30-minute walkthrough call + a sample deterministic policy you can drop in front of your agents

Guarantee:if the report doesn't surface at least one issue you didn't already know about, you pay nothing.

$750founding

standard $2,500 · 5 business days

See a sample teardown ↗

§ 09Pricing

Open source. Hosted when you need it.

available

Free

self-hosted

All three modules. Local CLI. Full OWASP / NIST / ATLAS report generator. MIT license.

✓All three modules
✓Local-first CLI
✓Full report generator (HTML + PDF)
✓sha256-verified installer
✓Run anywhere

Install →

early access · waitlist open

Pro

Early access

$199/mo · planned

Hosted control plane for AI teams shipping agents at velocity. Currently in private early access — join the waitlist below.

✓Hosted dashboard (in build)
✓Findings history + cross-scan diffing
✓Scheduled scans
✓Slack alerts
✓Up to 10 agents

design partners

Enterprise

Talk

to us

On-prem / VPC. SSO, audit logs, signed compliance reports, SLA. Taking a small number of design partners in regulated industries.

✓SSO + audit logs
✓On-prem / VPC deploy
✓Signed compliance reports
✓SLA + private Slack channel
✓Unlimited agents

§ 10Common questions

What people ask before they install.

Q.01How is this different from an LLM-as-judge guardrail?: An LLM judge is another model you have to trust — and another attack surface. Capframe's Guard is a deterministic Python evaluator: the same input always yields the same allow/deny, with no model in the path. That means it's fuzzable, reproducible, and immune to the next jailbreak paper. LLM judges are useful for content classification; they are not safe to put inline at a tool-call boundary.
Q.02How is this different from prompt-injection scanners or red-team frameworks?: Scanners tell you about a vulnerability after the fact. Capframe enforces the boundary at the moment of the call — the agent never gets to make a refund it shouldn't, regardless of what the prompt said. Find covers the offline discovery surface; Guard is the runtime backstop the scanner ecosystem doesn't have.
Q.03Is the runtime fast enough for production?: Yes. Guard is a deterministic rule engine — no model inference and no network hop in the decision path, so a decision is a pure function of (policy, call). (Raw capability verification, in the Rust Bind layer, runs in single-digit microseconds.) Drop Guard inline at the tool-call boundary and measure it on your own traffic.
Q.04Does my agent data leave my environment?: No. It's local-first — Find, Bind, Guard, and Report all run on your laptop, your CI, or your inference host (Find and Bind as Rust binaries, Guard as a pip-installable Python layer). The Pro / Enterprise hosted control plane is opt-in and stores only the metadata you choose to sync.
Q.05Does this only work with MCP?: Today, yes — Capframe is built around the Model Context Protocol. That covers Claude Desktop, Cursor, Continue, Cline, LangGraph via the MCP bridge, and most agentic Rust/Python frameworks. Native adapters for OpenAI function calling and Anthropic tool use are on the roadmap; the policy file stays the same.
Q.06Why three separate modules?: Different teams adopt them at different speeds. Security teams typically start with Find to baseline their agent surface. AI engineers usually start with Guard because it solves the immediate runtime problem. The capability-token layer (Bind) is for teams ready to commit to a permission model. The findings.v1 schema ties them together when you're ready.
Q.07Why open source?: Security infrastructure you can't read isn't trustworthy — and AI-agent enforcement is too new a category to be locked behind a closed binary. The code your boundary depends on should be inspectable, fuzzable, and forkable. MIT licensed, every line.
Q.08What does the auditor actually receive?: An HTML or PDF report (capframe report) with: the findings table mapped to OWASP LLM Top 10 and MITRE ATLAS techniques, the active policy file with its hash, the signed receipt count by verdict, and the NIST AI RMF function coverage matrix. Timestamped and signature-verifiable end to end.

Capability securityfor AI agents.

Your agents inherit the full authority of the credentials you hand them.

Four stages. Microsecond capability checks. One findings schema. No model in the loop.

Standalone, or composed. Either way, three primitives — not three products.

Find

Bind

Guard

Bind isn't a claim. It's been attacked ten times — and every result is published.

A Claude Opus 4.7 agent, told to move money it isn't scoped for.

The runtime is the product. Everything else is paperwork.

Deterministic policy evaluator

Macaroon-style capability tokens

Tamper-evident receipts

findings.v1 wire format

Static binaries, no daemon

MCP-native, framework-agnostic

The only artifact mapping all three agent-security frameworks at once.

OWASP LLM

NIST AI RMF

MITRE ATLAS

Eighty seconds, four commands, one auditor-ready report.

Short on time? We'll audit your agents.

Agent Security Audit

Open source. Hosted when you need it.

Free

Pro

Enterprise

What people ask before they install.

Capability security
for AI agents.