Simon Willison's Six-Month LLM Recap: Where AI Actually Stands in Spring 2026

Six months in AI feels like six years in any other industry. If you’ve been struggling to keep up, Simon Willison is the analyst worth bookmarking. The Django co-creator turned full-time LLM chronicler has built a daily blog that’s quietly become required reading inside Silicon Valley engineering orgs.

His latest piece — a six-month recap covering late 2025 through spring 2026 — is the cleanest summary going around. Here’s what he sees, and what most people are still missing.

Reasoning models rewrote the rulebook

When OpenAI shipped o1, the question on Hacker News was blunt: do we actually need this? Six months later, that debate is settled.

Anthropic, Google, DeepSeek, and xAI have all shipped reasoning models. Prices have collapsed. Willison’s framing is the one to remember: LLMs stopped being instant-answer machines and started being thinking machines.

Math-olympiad-level problem solving is now table stakes in mid-tier models. Coding benchmarks crossed expert-human territory months ago and kept climbing. Lenny Rachitsky’s recent “AI state of the union” episode crossed 190,000 views declaring we’d “passed the inflection point” — and the data backs him up.

Agents finally graduated from demo to dependency

A year ago, “AI agent” was a pitch-deck word. Cool to watch, scary to deploy. Willison argues the last six months flipped that.

Claude Code, OpenAI’s Operator, and a wave of open-source frameworks are now executing hour-long and day-long workflows. Writing code, running tests, fixing bugs, opening PRs — end-to-end, with humans as reviewers rather than drivers.

The failures are still ugly. Willison is refreshingly direct about it: a misaligned agent will burn six-figure token counts chasing the wrong rabbit hole. But his point is that the success rate finally crossed the threshold where you can plug them into a real workflow and expect more wins than losses. That’s the line that mattered.

Multimodal stopped being a feature

Six months ago, “this model reads images” was a launch headline. Now it’s an assumed default. Text, image, audio, and video in a single forward pass — and the pricing is almost embarrassing.

Willison keeps returning to audio input as the underrated example. Drop in a meeting recording and you get summary, action items, and speaker diarization in one call. Hour-long lecture videos collapse into structured notes.

His sharper question: why are everyday users not touching any of this? His answer is information asymmetry. The capabilities exploded; the average ChatGPT user is still pasting text and hitting enter, unaware that the model on the other end can ingest their entire podcast backlog.

The price collapse — and where it came from

The most jaw-dropping number from the recap is token cost. GPT-4-class performance is roughly 1/100th the price it was a year ago. Willison calls it the fastest deflation he’s ever seen in any technology category.

The catalyst was Chinese open-source. DeepSeek, Qwen, and Kimi pushed frontier-class models into the public domain at near-zero cost, and closed labs had no choice but to follow on price. The implication is clean: AI is no longer a scarce resource — it’s becoming ambient infrastructure. Willison’s line on this is sharp: not using AI in 2026 is like not using Google search in 2005.

Capability outran the conversation

Willison closes on a note that lands harder than the rest. The tech sprinted; the societal conversation about how to use it responsibly barely moved.

Hallucinations dropped but didn’t vanish — and reasoning models still produce confidently wrong answers with full justifications attached. Prompt injection remains an unsolved problem, and now that agents have real permissions, the blast radius is bigger than ever.

The last six months were the moment LLMs crossed from impressive demo to actual tool. What makes Willison’s recap useful is that he strips away the marketing and shows the working surface underneath. The question he leaves open is the one worth sitting with: are you using this stuff every day, or still telling yourself you’ll get around to it?

Simon Willison's Six-Month LLM Recap: Where AI Actually Stands in Spring 2026

Reasoning models rewrote the rulebook

Agents finally graduated from demo to dependency

Multimodal stopped being a feature

The price collapse — and where it came from

Capability outran the conversation

Comments

Related Logs

A 26M-Parameter Model Mimicking Gemini? Needle and the Quiet Rise of Tiny Specialists

A Fields Medalist Tried GPT-5.5 Pro on Real Math. His Verdict Surprised Everyone.

12 Million Tokens or Vaporware? The Subquadratic Bet to Kill Transformers