Multi-Agent Wiki

Safety, Permissions, and Guardrails

Security boundary design for multi-agent platforms.

The biggest risk in a multi-agent platform isn't "wrong answer" — it's the uncontrolled chain of actions that emerges when multiple agents touch tools, the filesystem, shells, networks, and external systems.

Permission tiers

LevelActionDefault policy
L0Read-only context, read-only docsAllow
L1File reads, search, static analysisAllow, but logged
L2File writes, config changesRequires policy check
L3Shell, dependency install, network callsRequires approval or sandbox
L4git commit, deploy, database writesHuman approval required
L5Production, finance, permission changesDenied by default or strict approval

Guardrail types

TypeExamples
Input guardrailDetect prompt injection, out-of-scope requests
Tool guardrailRestrict shell commands, network domains, file paths
Output guardrailDetect leaks, dangerous advice, policy violations
Budget guardrailCap tokens, time, concurrency
State guardrailPrevent blackboard pollution
Handoff guardrailPrevent loops and unauthorized takeover

Example shell policy

TypeScript
const shellPolicy = {
  allow: ["ls", "cat", "grep", "rg", "sed", "python -m pytest"],
  deny: ["rm -rf", "sudo", "curl | sh", "chmod 777", "docker system prune"],
  requireApproval: ["git push", "npm publish", "kubectl", "terraform apply"],
};

Approval card must include

  • action — what will run
  • reason — why it's running
  • scope — what it affects
  • diff — what will change
  • rollback — how to undo
  • risk — severity tier
  • timeout — what happens if approval times out

Common failures

  1. Sub-agents call tools the host agent doesn't know about.
  2. Permissions expand after a handoff.
  3. MCP server exposes too many tools.
  4. Approval cards show one line of prose without diff or impact.
  5. Trace records sensitive info without redaction.