Is Local LLM on a MacBook Actually Cheaper Than OpenRouter? I Ran the Numbers
“Why pay for ChatGPT when I have a MacBook Pro sitting right here?” It’s the question echoing through every dev Slack and Hacker News thread these days. But once you actually plug in the numbers — electricity, hardware depreciation, the works — the conventional wisdom starts looking shaky. Today I’m taking a calculator to the claim that local LLMs beat cloud APIs on cost.
What a Token Actually Costs on Apple Silicon
Start with the raw throughput. An M4 Mac Mini or M3 Max MacBook Pro running Llama 3.1 70B churns out roughly 8 to 15 tokens per second. That’s “friend typing slowly on iMessage” speed — fine for solo work, painful for anything interactive.
Then there’s the power draw. An M3 Max under inference load pulls around 70 to 90 watts; the M4 Mac Mini sits closer to 35 to 50 watts. At a typical US residential rate of about $0.16 per kWh, an hour of full-tilt inference runs you a couple of cents.
Sounds trivial. But normalize to tokens. At maybe 30,000 to 50,000 tokens per hour, electricity alone works out to roughly $0.40 to $0.55 per million tokens.
What OpenRouter Charges for the Same Model
Now look at the cloud side. The same Llama 3.1 70B on OpenRouter goes for around $0.40 to $0.60 per million tokens. DeepSeek V3 drops as low as $0.27 per million. Frontier-class models are pricier, but the mid-tier open-weights story is brutal for local.
Here’s the first gut punch: on electricity alone, your MacBook is barely matching cloud pricing. Hyperscalers running H100 clusters at hundreds of tokens per second are passing the economies of scale down to you faster than your laptop fans can spin up.
The Hidden Cost: Hardware Depreciation
This is where it gets ugly. An M3 Max MacBook Pro with 36GB runs about $3,200; the 64GB+ configs blow past $4,000. If you’re seriously calling it an AI workstation, three-year depreciation is the conservative accounting.
Say you run inference two hours a day for three years. That’s roughly 2,190 hours of compute, or about 100 million tokens total output. Spread the hardware cost across that and you’re adding around $32 per million tokens in depreciation.
Electricity ($0.50) plus depreciation ($32) lands you at roughly $32.50 per million tokens for local. OpenRouter’s ~$0.50. That’s local being roughly 65 times more expensive, not cheaper.
So Why Does Anyone Run Local?
Because cost was never the real reason. Channels like Alex Ziskind’s pull six-figure view counts on “why local LLMs are 10x slower” content for a reason — the people clicking aren’t optimizing for dollars per token. They’re optimizing for privacy and control.
Air-gapped enterprise environments. HIPAA-regulated medical data. EU clients with strict data residency requirements. Workflows that have to keep running on a plane. In those contexts, $32 per million tokens is a bargain compared to a compliance review. There’s also the marginal-cost argument: if you bought the MacBook for work anyway, only electricity counts on the margin.
The interesting twist is how fast the sub-$1,000 AI hardware market is fragmenting. M4 Mac Minis, AI PCs with 96GB DDR5, AMD-based mini servers running ROCm — each carving out a niche for different inference workloads.
The Real Question Isn’t “How Much”
Bottom line: “local is cheaper” is an electricity-only illusion. Once hardware costs enter the equation honestly, OpenRouter and its peers crush local inference for anyone with light-to-moderate usage.
If you still pick local, the justification has to be data sovereignty, latency control, offline operation — anything but the bill. So what is your MacBook actually generating tokens for tonight? Worth asking whether it’s pulling its $32-per-million weight.
Comments
Loading comments...