Anthropic Just Told You How Dangerous Its Own AI Could Be

An AI company just published a detailed assessment of how well its latest model could help someone launch a cyberattack. That company is Anthropic, and the document is the system card for Claude Mythos Preview. It reads less like a press release and more like a voluntary threat assessment — which is exactly the point.

What a System Card Actually Is

Think of a system card as a drug’s clinical trial report, but for an AI model. It documents performance, limitations, and risks in a structured format. Anthropic publishes one before every major release as part of its Responsible Scaling Policy (RSP).

The core question the card tries to answer: does this model cross a dangerous capability threshold? Anthropic uses a framework called AI Safety Levels (ASL) — higher levels demand stricter safeguards. If a model trips a threshold, it doesn’t ship until new protections are in place.

How Good Is Mythos at Hacking?

The most striking section of the card is the cybersecurity capability evaluation. Anthropic systematically tested whether Mythos could assist with real-world offensive scenarios: vulnerability discovery, exploit code generation, network penetration planning.

The short version: Mythos is meaningfully better than previous Claude models at assisting with cyber operations. It represents a clear generational jump.

The longer version is more nuanced. Anthropic concluded that Mythos does not reach the level of replacing a skilled human attacker. It can boost productivity for someone who already knows what they’re doing, but a novice can’t just point it at a target and walk away with a working exploit chain. The model landed below the threshold that would have triggered ASL-3 restrictions for cyber risk.

The Clickbait Writes Itself

Predictably, the reaction online has been heavy on drama. YouTube thumbnails scream about “the AI too dangerous to release” and “the model that escaped containment.” Some of these videos are racking up serious view counts.

You can see why. Put the words “cyberattack capability” in any document and people get nervous. But the framing misses the point. Anthropic didn’t publish this because the model is uniquely dangerous. It published it to demonstrate that it’s measuring and managing the danger. There’s a difference between a pharmaceutical company disclosing side effects and selling poison. The disclosure is the responsible part.

Transparency as Competitive Strategy

Anthropic isn’t alone in publishing safety evaluations. OpenAI releases system cards for its GPT series. Google DeepMind runs its own safety assessment frameworks. But on offensive cyber capabilities specifically, Anthropic’s disclosure is the most granular the industry has seen.

This isn’t altruism — it’s strategy. With the EU AI Act moving into enforcement and US executive orders pushing for pre-deployment safety testing, voluntary transparency is becoming a form of regulatory leverage. Companies that can say “we already evaluate for this” sit in a stronger position when regulators come to the table. It’s the same playbook Big Tech has run in privacy and content moderation: set the standard yourself before someone sets it for you.

The Questions the Card Can’t Answer

For all its detail, the system card has real limitations. Offensive cyber scenarios are effectively infinite. A controlled benchmark can test whether a model writes a buffer overflow exploit, but real-world attacks chain together social engineering, zero-days, misconfigurations, and patience in ways no evaluation suite fully captures. How well these benchmarks track actual threat levels remains an open question.

Then there’s the trajectory problem. Today’s conclusion is “not yet at expert level.” But capability curves in frontier AI have been steep. Each generation closes the gap. Who decides when the line has been crossed — and what happens the day after?

What Anthropic has done with this system card isn’t so much providing answers as proposing the right framework for asking questions. The useful debate isn’t whether AI could become a cyber weapon. It’s whether we have reliable ways to measure that risk and credible systems to act on the measurements. Right now, we’re relying heavily on the companies building these models to grade their own homework. Whether that’s enough is the real question this card leaves open.

Anthropic Just Told You How Dangerous Its Own AI Could Be

What a System Card Actually Is

How Good Is Mythos at Hacking?

The Clickbait Writes Itself

Transparency as Competitive Strategy

The Questions the Card Can’t Answer

Comments

Related Logs

Anthropic Wants to Mathematically Eliminate Software Bugs. Here's Why That's Hard

When Will Quantum Computers Break Encryption? The Timeline Debate That Misses the Point