Stop Stuffing Your Prompts. Your AI Agent Needs Control Flow.
If you’ve shipped an AI agent recently, you know the wall. Your system prompt creeps from 50 lines to 200. You add another “NEVER do X” bullet. The agent still calls the wrong tool, repeats itself, or skips a step that mattered. So you write another bullet.
The uncomfortable truth quietly going mainstream on YouTube, Hacker News, and dev Twitter: prompt engineering has a ceiling, and most teams have already hit it. The agents that actually work in production aren’t built on cleverer prompts. They’re built on control flow — old-fashioned software architecture, with the LLM placed surgically inside it.
Why prompts alone break
For a stretch in 2024 and 2025, the vibe was “just prompt harder.” Cram every rule into the system prompt. Stack few-shot examples. Sprinkle “think step by step” like seasoning. It worked well enough for single-turn tasks that the industry convinced itself it would scale.
It doesn’t. The cracks open the moment your agent has to make multi-step decisions: take a customer ticket, query the database, evaluate a refund policy, execute or escalate. If the LLM misjudges any step — and over a long enough session, it will — the whole chain collapses. You can’t prompt your way out of compounding error rates.
ForrestKnight’s “Everything You Need to Know About Coding with AI,” now north of 150K views, lands on a blunt version of this: stop vibe coding, start with the structure. Let the model do work, but humans own the skeleton.
What “control flow” actually means
Strip away the buzzwords and it’s the boring stuff: deciding in code when to call the LLM and when not to. You’re not delegating every judgment to a stochastic model. You’re choosing where stochasticity earns its keep.
This is what software engineers have always done. if statements, switch blocks, state machines, workflow engines, finite-state orchestrators like Temporal or LangGraph. None of that disappeared in the agent era. It got more important.
A concrete example. A user message arrives:
- Classify intent — LLM is great at this
- If intent is “refund” — branch in code, not in a prompt
- Check refund policy — that’s a SQL query, not an inference
- Policy passes — call the payments API directly, no LLM in the loop
- Policy fails — escalate to a human
The LLM does two things: classification and writing a friendly response. Everything load-bearing is deterministic code. That’s what a stable production agent looks like.
The real point of “Stop Building AI Agents”
Zubair Trabzada’s “STOP Building AI Agents. Do THIS Instead.” has cleared 230K views with a deliberately provocative title. The message underneath is simple: don’t reach for autonomous agents first. Start with a workflow.
Anthropic’s own writing on this draws the same line. Workflows follow predefined steps. Agents let the LLM decide what happens next at each turn. Both have a place. But the dirty secret is that the overwhelming majority of business problems are workflows in disguise. You only need agentic autonomy in the narrow slices where the next step genuinely can’t be enumerated in advance.
The pragmatic reasons matter too: latency and token cost. An LLM call at every node means 3-to-5-second response times and a bill that scales linearly with users. Replacing model calls with code where you can is the difference between a demo and a product.
Prompt engineering didn’t die. It moved.
It’s not extinct — it relocated. IBM Technology’s “RAG vs. Fine-Tuning vs. Prompt Engineering” sitting at 640K views tells you people still want to know how the pieces fit.
The new mental model: control flow holds the skeleton. Prompts handle precision inside each node. RAG injects context where the model needs it. Fine-tuning earns its keep only when domain adaptation goes deeper than retrieval can reach. Stop trying to make one prompt do a hundred jobs. Use each tool for what it’s actually good at.
Inside an agent, prompts get shorter and narrower. “At this step, classify user intent into one of these five buckets” beats a 200-line god-prompt every time. Smaller scope means easier evals, easier debugging, easier replacement when a better model ships next quarter.
What to do Monday morning
If you’re staring at an agent codebase right now, ask one question on every step: does the LLM actually need to decide this? If the answer is no, that step belongs in code. Reserve the model for natural-language understanding, generation, and the genuinely ambiguous classification calls.
And a heuristic worth tattooing on your monitor: if your prompt is past 200 lines, the architecture is wrong. A single call is carrying too much responsibility. Split the steps. Connect them with code. You’ll almost always end up faster, cheaper, and more debuggable.
The competitive edge in agents won’t come from whoever has the smartest model — everyone gets the same models. It’ll come from whoever places that intelligence most precisely. Next time you reach to add a line to your system prompt, pause and ask whether it could just be an if statement instead. Where does your agent sit right now — drowning in prompt sprawl, or built as a clean flow with the LLM dropped in like a scalpel?
Comments
Loading comments...