Claude Opus 4.7 Is Quietly 20% More Expensive. Blame the Tokenizer
Something odd started showing up in developer Discords and HN threads last week. Same prompts, same workflows, same codebase — but the Anthropic invoice came in 20% higher. Some teams reported closer to 30%. The published per-token pricing on Opus 4.7 is identical to 4.6. So where’s the money going? Into a component almost nobody pays attention to: the tokenizer.
The invisible meter under every API call
Before an LLM can process your text, it has to chop it into tokens — discrete units that map to integer IDs. That’s the tokenizer’s job. The thing is, the API bills you by tokens, not characters. So if the tokenizer slices the same sentence into more pieces, you pay more. Same input, same output, bigger bill.
Opus 4.6’s tokenizer was well-tuned for English and code, but notoriously loose with CJK languages and certain structured formats. Opus 4.7 ships a reworked tokenizer: larger vocabulary, better handling of multimodal inputs, and a redesigned scheme for tool-use tokens. On paper, this should make things cheaper — larger vocabulary means each token carries more content.
Why the bill went up anyway
Here’s the counterintuitive part. For some languages (Korean, Japanese), token counts per sentence actually dropped. But the savings get eaten elsewhere.
First, system prompt and tool-definition overhead grew. Opus 4.7 ships with more elaborate tool-calling schemas and baked-in safety scaffolding. Those tokens ride along on every request, whether you asked for them or not. Even a one-line user query now carries a heavier baseline.
Second, output length crept up. 4.7 tends to write longer, better-reasoned responses. Quality improved. But output tokens cost 5x input tokens — $15 vs $75 per million. A 10% bump in output length hits the invoice much harder than it looks.
Third, thinking tokens are on by default, and 4.7 uses more of them on average. Those are the internal reasoning steps the model runs before replying. You don’t see them in the UI. You absolutely see them on the receipt.
The damage depends on your workload
For balanced workloads like code review, the delta is usually 5-10%. Nothing to panic about. But long-context summarization and agentic workflows — the kind where the model chains tool calls across a session — are where the 20-30% jumps show up consistently. The more tool calls per session, the longer the conversation, the heavier the shift toward non-Latin scripts, the bigger the spread.
Claude Code users are feeling this most acutely. File reads, edits, searches, greps — every action burns tokens, and 4.7 packs richer context into each tool call. The agent is genuinely smarter. Thirty minutes of work produces noticeably higher bills than it did a month ago.
How to stop the leak
Three concrete moves. First, lean hard on prompt caching. Cache hits bill at 10% of the input rate on 4.7 — your system prompt and recurring context should live in cache, full stop. Second, route by task. Simple classification and summarization belong on Haiku 4.5 or Sonnet 4.6. Reserve Opus 4.7 for the reasoning-heavy work that actually needs it. Third, cap thinking token budgets explicitly via the budget_tokens parameter. You can often clip reasoning length significantly with minimal quality loss.
The takeaway
The price list was honest. The tokenizer rewrote the receipt without telling anyone. As AI moves from demo toy to production line item, reading the per-million sticker price is no longer enough — the real cost lives in token accounting, and that accounting changes with every model release. If your bill looks weird this month, open the token usage dashboard before you open a support ticket. How does yours compare to last month?
Comments
Loading comments...