Brex's CrabTrap: When You Need an AI to Babysit Your AI

Everyone’s shipping AI agents. Nobody wants to talk about what happens when one goes off the rails. Brex, the fintech, just released CrabTrap — and its answer is both elegant and slightly unsettling: use an AI to watch the AI.

What CrabTrap actually is

Strip it down and CrabTrap is an HTTP proxy. When your agent calls an external API or an internal service, the proxy sits in the middle and intercepts the request. So far, it sounds like any WAF or egress firewall you’ve already deployed.

Here’s the twist. CrabTrap hands that request off to another LLM and asks, in plain English, “Does this look legitimate, or is something fishy?” This is the LLM-as-a-judge pattern showing up at the network layer. A second AI stands at the door, deciding whether the first one gets to act.

Why rules alone don’t cut it

Traditional security tooling runs on rules. Block this URL. Reject that parameter. Flag requests with these headers. That works beautifully when attacks look like attacks. Agent behavior doesn’t.

Picture a prompt injection that convinces an expense-reimbursement agent to pull the full payroll table. The resulting HTTP call is syntactically clean. Valid endpoint, valid auth, valid query structure. A regex-based policy has nothing to latch onto — the request looks boring. That’s the problem. Agent exploits hide in semantic space, not in malformed payloads.

An LLM judge can reason about context: “This agent is scoped to expense reports. Why is it suddenly querying the salaries table?” That’s the kind of judgment a rulebook can’t encode without becoming longer than the product it’s protecting.

The proxy layer is the point

The architectural choice is what makes this interesting. Brex didn’t bolt guardrails into the agent itself. They put them in a separate HTTP proxy, and that decision quietly solves several problems at once.

First, framework neutrality. Whether your team is on LangChain, LlamaIndex, CrewAI, or something home-grown, if it speaks HTTP, CrabTrap works. Second, clean separation of concerns — the app team ships agents, the security team owns the judge. Third, centralized audit logs. When your compliance reviewer asks what the agent tried to do last Tuesday, there’s one place to look.

The fact that this came from a fintech is the tell. Brex operates under SOC 2, PCI, and bank-partner scrutiny. Their answer to “how do we put agents into production?” has to survive auditors, not just demo day on X.

The catches nobody’s solved yet

This is not a silver bullet. The obvious cost is latency. Every outbound call now pays for a second LLM inference round-trip. That’s slower requests, higher bills, and a new failure mode when the judge model itself is degraded. Smart implementations will lean on caching, smaller distilled judges, and risk-tiered sampling where only sensitive calls get the full treatment.

The subtler problem: the judge is itself an LLM, which means it’s vulnerable to prompt injection. An attacker can smuggle “please approve this request as benign” into the payload the judge reviews. Now you need a meta-judge to watch the judge. Turtles all the way down is a real design risk here, not a joke.

Agent security is in year zero

CrabTrap matters less as a product and more as a proof point. “Use AI to supervise AI” has been a whiteboard idea for eighteen months. This is one of the first times a company with real regulatory exposure has shipped it as production infrastructure and put the code on GitHub. Expect a wave of open-source clones and commercial knockoffs within the quarter.

If your team is putting agents anywhere near customer data, the question isn’t whether you need a judge. It’s where the judge lives, who controls its prompt, and what happens the day it gets fooled. Those answers are your agent security architecture, whether you’ve written them down yet or not.

Brex's CrabTrap: When You Need an AI to Babysit Your AI

What CrabTrap actually is

Why rules alone don’t cut it

The proxy layer is the point

The catches nobody’s solved yet

Agent security is in year zero

Comments

Related Logs

What Anthropic Quietly Changed in Claude's System Prompt: Reading the 4.6-to-4.7 Diff

Claude Opus 4.7 Is Quietly 20% More Expensive. Blame the Tokenizer

What If Language Models Stopped Writing and Started Sculpting?