AuthSecAuthSec
W
Security8 min readApr 20, 2026

We Built a Firewall for AI Agents. Here's What Happened When Claude Code Tried to Delete Our Files.

You give Claude Code your terminal. You give Codex your codebase. You give OpenClaw your entire machine — and they can rm -rf your home directory or git push --force to production before you know it. AuthSec Agent Shield stops that at the OS level. Here's what happened when we put it head-to-head with Claude Code.

AI AgentsSecurityClaude CodeMCP
RKK

Ritam Kumar Kundu

Engineering

The Problem

You give Claude Code your terminal. You give Codex your codebase. You give OpenClaw your entire machine. They can rm -rf your home directory, git push --force to production, DROP TABLE users, or kubectl delete namespace production — and you'll only find out after it's done.

We built something to stop that.

AuthSec Agent Shield is a single binary you install on your machine. It replaces dangerous system commands — rm, git, docker, kubectl, mysql, psql — with shield shims at the OS level. When any process (AI or otherwise) tries to run a risky command, your phone buzzes. You approve or deny. Then it executes.

We tested it live against Claude Code. Here's what happened.

The Test

We installed Agent Shield on a Windows machine with Claude Code (Anthropic's AI coding assistant) running. Then we asked Claude Code to do increasingly dangerous things. Each command triggered the shield, scored the risk locally, and requested approval via push notification before executing.

What Claude Code Tried — and What Happened

Every risky attempt hung the terminal while the shield waited for our approval. Safe, read-only commands passed instantly with zero latency.

  • rm -rf /tmp/shield-test/important.txt — score 55 (medium). BLOCKED. Terminal hung waiting. File survived. Phone buzzed. We denied.
  • git push --force origin main — score 100 (critical). BLOCKED. Matched dangerous command, force flag, and git push detection.
  • kubectl delete namespace production — score 100 (critical). BLOCKED.
  • docker rm -f my-container — score 50 (medium). BLOCKED.
  • mysql -e 'DROP TABLE users' — score 100 (critical). BLOCKED.
  • git status — score 0. PASSED instantly.
  • docker ps — score 0. PASSED instantly.

Layer 1: Binary Shims

The shield literally replaces system binaries with itself. The rm, git, docker, kubectl, mysql, and psql binaries all point to the shield; the originals are backed up as .rm.shield-real, .git.shield-real, and so on.

When any process calls rm, it hits the shield. The shield scores the risk, requests approval if needed, and only then calls the real binary.

This is un-bypassable. The AI tool doesn't know it's being guarded.

Layer 2: Filesystem Protection

NTFS DENY write ACLs on Windows, or chattr +i on Linux, lock down sensitive directories: .ssh, .aws, .kube, .env. No process can write to these without the shield unlocking them first.

Layer 3: Shell Hooks

A PowerShell CommandValidationHandler wraps cmdlets like Remove-Item and Set-Content. On Unix we hook Bash preexec via the DEBUG trap and use Zsh's native preexec. This layer catches commands that don't go through shimmed binaries — so the coverage holds even when an agent uses shell built-ins.

Layer 4: MCP Proxy

A transparent JSON-RPC proxy sits between Claude Desktop, Codex, or Cursor and their MCP servers. It intercepts write_file, delete_file, and execute_command tool calls before they reach the server — so risky actions are gated even when they never touch a shell.

Local Risk Scoring (Zero Latency for Safe Commands)

Every command is scored locally. No network call. Scores 0–30 pass silently. Anything 31 or above pushes a notification to your phone via AuthSec CIBA.

The scoring rubric stacks signals additively:

  • Destructive verb (rm, drop): +40 points.
  • Dangerous pattern match: +60 points.
  • Force flag (--force, -rf): +20 points.
  • Recursive operation: +15 points.
  • Sensitive path (.env, .ssh): +25 points.
  • System path (/etc, C:\Windows): +30 points.
  • SQL destructive statement: +60 points.

Two Products, One Platform

Agent Shield is for teams using third-party AI tools — Claude, Codex, Cursor, OpenClaw. One install per machine, every risky command governed.

Agent Guard is for developers building their own AI agents. Three lines of code adds human-in-the-loop approval to any agent framework.

Both feed into a single admin dashboard that gives organizations full visibility:

  • Which users have protection installed.
  • Every risky command attempted, by whom, with risk score and outcome.
  • Approval and denial history with biometric verification records.
  • Configurable risk policies per organization.
  • Full audit trail for SOC 2, ISO 27001, and HIPAA compliance.

Risk Policies and Approval Thresholds

Policies are configurable per organization. A typical production-database rule might target the DELETE action on any resource matching *.database.*, with a base score of 90 and a multi-approval requirement above 70.

Approval thresholds follow naturally from the score:

  • 0–30: auto-approve, logged silently.
  • 31–80: single human approval via push notification.
  • 81–100: multi-party — two humans must agree before the command runs.

The Standard: AAAP

We published the Agent Action Approval Protocol (AAAP) — an open standard for human-in-the-loop governance of AI agent actions.

Standards exist for agent identity (OAuth), agent communication (MCP, A2A), and risk taxonomy (OWASP) — but no standard existed for the runtime approval of individual agent actions by humans. AAAP fills that gap.

AAAP covers 8 of the 10 OWASP Agentic Top 10 risks and aligns with the NIST AI Agent Standards Initiative.

Put simply: OAuth tells agents who they are. MCP tells agents what tools exist. AAAP tells agents what they're allowed to do — and which human said so.

Try It

The bootstrap is four commands. Build the shield binary, configure it with your org's client ID, log in, and install the protection layer. Then run the built-in tester — dangerous commands like 'rm -rf /' return score 100 and block, while safe commands like 'git status' return score 0 and pass.

Your machine. Your rules. Your phone is the key.

AuthSec Agent Shield is part of AuthSec — enterprise identity and access management for the AI era. Find us at github.com/authsec-ai.

  • shield setup — configure your client ID and base URL.
  • shield login — authenticate your user.
  • sudo shield install — install the four-layer protection on your machine.
  • shield test '<command>' — dry-run the scorer against any command.
Share this article: