12 Million Tokens or Vaporware? The Subquadratic Bet to Kill Transformers

A startup named Subquadratic — SubQ for short — is making the kind of claims that get you either a Nobel-adjacent breakthrough or a Netflix documentary. 12 million tokens of context. 1000x faster inference. Goodbye Claude, ChatGPT, and Gemini, allegedly. The AI world is split between cautious excitement and outright skepticism, and both camps have a point.

Why Transformers Hit a Wall

Nearly every LLM you use today rides on the transformer architecture from the 2017 “Attention Is All You Need” paper. The magic ingredient is attention: every token in your input compares itself against every other token.

That comparison is also the curse. Double the tokens, quadruple the compute. This is quadratic complexity, and it’s why long context windows cost a fortune. Past a million tokens, GPUs start melting and bills start exploding. That’s the ceiling GPT-4, Claude, and Gemini have all been bumping against.

The 12-Million-Token Pitch

SubQ’s name is the thesis: an architecture whose compute scales sub-quadratically with context length. Add more tokens without the cost curve going vertical.

The idea isn’t novel. Mamba, RWKV, and Hyena have all chased subquadratic alternatives over the past few years, mostly with promising papers and modest production traction. What makes SubQ different is the number on the box: 12 million tokens, packaged as a real product rather than a research demo.

For scale: 12 million tokens is roughly 100 full-length English novels. Or your company’s entire monorepo. Or every Slack message your engineering team has ever sent. All loaded into a single prompt.

The Theranos-Shaped Cloud

YouTuber Krittin Kalra titled his takedown video “1,000x Faster AI or the Next Theranos?” — which captures the mood pretty well. If the claims hold, it’s a paradigm shift. If they don’t, it’s a cautionary tale about VC-fueled benchmark inflation.

The skepticism isn’t unfounded. AI startups have a long history of cherry-picked evals, and the technical disclosure around SubQ has been thin. There’s no peer-reviewed paper. No reproducible third-party benchmarks. Just blog posts, a slick demo, and confident numbers. Hacker News threads are full of researchers asking the obvious questions: where’s the architecture writeup, and where are the perplexity numbers on standard datasets?

Why It Matters Even If Half-True

Here’s the thing — even if SubQ delivers half of what it promises, the downstream effects are huge. Most production LLM apps today are elaborate workarounds for short context: RAG pipelines, chunking strategies, vector databases, retrieval reranking. Whole companies exist because shoving a full codebase into Claude is impossible.

A real 12-million-token model collapses that stack. You stop retrieving and just include. The entire wiki, the full repo, every customer ticket — dropped into one prompt. The plumbing layer of modern AI infrastructure becomes a lot less necessary.

The Catch Nobody’s Talking About

Long context isn’t the same as useful long context. The well-documented “lost in the middle” problem — where models reliably attend to the start and end of a prompt but lose track of information buried in the middle — hasn’t gone away just because someone scaled the window up. Gemini’s 2M-token context is technically impressive but practically uneven for exactly this reason.

If SubQ’s architecture genuinely solves middle-of-context recall along with the speed claim, that’s the actual breakthrough. If it just lets you stuff more tokens in while quietly forgetting half of them, it’s a benchmark trick.

Post-Transformer, Eventually

Transformers won’t reign forever. The quadratic scaling problem has been the field’s elephant in the room for years, and someone, eventually, will architect their way past it. Whether that someone is SubQ or whether they end up as a footnote in an AI hype-cycle retrospective is the open question.

What’s no longer debatable is that “what comes after transformers” has graduated from academic curiosity to investor pitch deck. The next few months — when SubQ either ships verifiable benchmarks or doesn’t — will tell us which story we’re in.

12 Million Tokens or Vaporware? The Subquadratic Bet to Kill Transformers

Why Transformers Hit a Wall

The 12-Million-Token Pitch

The Theranos-Shaped Cloud

Why It Matters Even If Half-True

The Catch Nobody’s Talking About

Post-Transformer, Eventually

Comments

Related Logs

What If ChatGPT Existed in 1930? The Thought Experiment Called Talkie 13B

What If Language Models Stopped Writing and Started Sculpting?

AI Slop Is Breaking Both Pillars of the Security Disclosure Economy