What Anthropic Quietly Changed in Claude's System Prompt: Reading the 4.6-to-4.7 Diff
Claude Opus 4.7 landed last week without much fanfare. What actually got people talking wasn’t the release notes — it was the fact that Anthropic quietly rewrote chunks of the system prompt. Once Simon Willison and the usual crew of prompt archaeologists started posting diffs, the reaction across HN and X shifted fast: this isn’t a point release, it’s a philosophy update.
A system prompt is the model’s constitution — the document it reads before it ever sees you. Changing a single line there can shift personality, refusal patterns, and answer length across millions of conversations. Reading the diff is basically reading Anthropic’s current anxieties in plain English.
What got cut: sycophancy and throat-clearing
Opus 4.6 already had an anti-sycophancy clause — the famous “don’t start responses by calling the question interesting or great.” In 4.7, Anthropic didn’t soften it. They doubled down. New lines reportedly tell the model to avoid opening with emotional validation and to skip any preamble that praises the user’s decision-making.
Interestingly, the blanket “avoid emojis in casual conversation” rule from 4.6 looks relaxed. The new framing is closer to mirror-the-user: if they bring emojis, you can match. Side-by-side demos circulating on YouTube show 4.6 answering dry and flat where 4.7 actually reads the room.
What got added: guardrails for the agent era
The most consequential additions target agentic use. New clauses instruct the model to evaluate the reversibility of each step in a multi-step task, and to pause for user confirmation before irreversible actions — file deletion, git pushes, sending messages to third parties.
This is the Claude Code and Managed Agents problem made explicit. 4.6 would generally do what you told it. 4.7 is trained, and now also prompted, to ask itself two questions first: does this touch shared infrastructure, and can I undo it? It’s the rm -rf insurance policy, written into the constitution.
What got quietly trimmed: topic avoidance
Willison flagged one subtler shift. The broad “stay neutral on all contested topics” language from 4.6 has been narrowed in 4.7. The specific guardrails around elections and personal opinions on politicians are still there, but the catch-all avoidance clause got surgically replaced with something more granular.
The subtext matters. Over-alignment — models that dodge everything remotely spicy — has become a real usability complaint, and one Anthropic is clearly now weighing against the old instinct to refuse-by-default. The pendulum is swinging back, cautiously.
The war on verbosity
4.7 also leans harder on format discipline. New lines push the model toward short, direct answers in conversational exchanges and explicitly tell it to stop reaching for markdown headers and bullet lists by reflex. HN threads are full of devs confirming the vibe shift: three-line questions no longer get twenty-line answers with three h2s and a bulleted conclusion. For anyone paying per token, that’s not cosmetic — that’s a real cost delta.
What the diff is actually telling us
Anthropic’s own blog called this a minor tuning pass. The diff disagrees. You can read three distinct pressure points in the changes: agent safety, less over-refusal, and brevity. Those happen to be the exact three puzzles every frontier lab is grinding on right now.
Moving to 4.7 isn’t a benchmark bump. It’s Anthropic’s current answer to the question of how a model should behave — where it should pause, where it should push through, how long it should talk. If you’re running Claude in production, don’t just bump the model ID. Compare the refusals and the tone. Sometimes a line in the system prompt moves more than a year of pretraining does.
Comments
Loading comments...