Hot Takes

DeepSeek V4 Came In 89× Cheaper Than Claude And Open-Sourced The Whole Thing

April 25, 2026 6 min read

April 24, 2026 was the day Silicon Valley’s pricing strategy stopped making sense. OpenAI used the morning to launch GPT-5.5 — repackaged as the “super app,” available through ChatGPT for whatever subscription tier you happen to be paying. DeepSeek used the same morning to drop V4-Pro and V4-Flash on Hugging Face under MIT license, with open weights, a one-million-token context window as standard, and an output price of $0.28 per million tokens on the smaller model.

That last number is not a typo and the comparison is not subtle. Claude Opus 4.6 charges $25 per million output tokens. Run ten million tokens through V4-Flash and you spend $2.80. Run the same workload through Claude and you spend $250. 89× cheaper, available to download, MIT-licensed. Same day, same hour, completely different physics.

I want to walk through what was actually released, because the headline is loud but the details are louder.

What dropped

Hugging Face: DeepSeek-V4 model card with MIT license badge and Introduction text
The DeepSeek-V4-Pro model card on Hugging Face. License: MIT. 78,864 downloads in the first day. The Technical Report and Open Weights links work. There is no waitlist, no commercial-use threshold, no seat license.

DeepSeek V4 is a two-model family, both Mixture-of-Experts, both open-weight on Hugging Face, both shipping with the long-context architecture as the default rather than as an afterthought.

V4-Pro is the headliner: 1.6 trillion total parameters with 49 billion activated per token. That makes it the largest open-weight model that exists right now — bigger than Moonshot’s Kimi K 2.6 (1.1T), more than double DeepSeek’s own V3.2. The architecture is built around 1M-token native context (not retrieved, not slid, not bolted on) and includes a dual Thinking / Non-Thinking mode toggle similar to Claude’s extended thinking.

V4-Flash is the smaller sibling: 284 billion total, 13 billion active. It’s still bigger than most flagship Western open-weight models and exposes the same 1M context window. Both come with a thin API wrapper at api.deepseek.com if you don’t want to host them yourself.

The license is MIT. There is no “research only” clause, no “non-commercial” clause, no commercial-use threshold above which you have to email a sales team. You download the weights, you run them, you ship a product. That is the entire deal.

The benchmarks

DeepSeek API docs: V4 Preview Release page with spec table and benchmark chart vs Claude/GPT-5.4/Gemini
DeepSeek’s own release page does not bury the comparison. Performance bars show V4-Pro-Max next to Claude Opus 4.6-Max, GPT-5.4-xHigh and Gemini 3.1-Pro-High. They are not pretending to be a tier below the frontier — they are claiming the same row.

DeepSeek’s release was timed for the GPT-5.5 launch on purpose, and the benchmarks make the reason obvious. On LiveCodeBench, V4-Pro scores 93.5, ahead of Gemini (91.7) and Claude Opus (88.8). On Codeforces — the live competitive programming rating — V4-Pro lands at 3206, a hair above GPT-5.4 (3168) and well above Gemini (3052). On agent tasks, DeepSeek’s own internal eval claims V4-Pro-Max “outperforms Claude Sonnet 4.5 and approaches the level of Opus 4.5,” which is the kind of language a Chinese lab does not use lightly.

The picture is not all clean wins. On SWE-bench Verified — multi-file software engineering with real GitHub issues — V4-Pro lands at 80.6%, marginally behind Claude Opus 4.6 at 80.8%. Effectively a tie, but Claude is still the king of long-horizon multi-file refactors. On reasoning benchmarks, V4-Pro scores 8.17 against Claude Opus 4.6 standard at 8.18, then gets blown out by Claude Opus 4.6 Thinking at 8.82. The thinking-mode gap is real.

The honest synthesis: DeepSeek V4-Pro is roughly 3-6 months behind frontier on pure capability, ahead of frontier on coding speed and Codeforces, and operating in a different universe on price. That’s a coherent strategic position. It is also a gift to anyone building products on top of LLMs.

The pricing humiliation

This is the part Western labs need to read twice.

Model Input ($/M tokens) Output ($/M tokens)
DeepSeek V4-Flash $0.14 $0.28
DeepSeek V4-Pro $1.74 $3.48
Claude Opus 4.6 ~$15 ~$25
GPT-5.5 Pro ~$10 ~$30

A million output tokens is roughly the size of an average novel. V4-Flash will write that for the price of a coffee that has gone cold. Claude will write the same novel for the price of a flight to Berlin. Both numbers are real, both APIs are public, both shipped on the same day in April.

Simon Willison: DeepSeek V4 — almost on the frontier, a fraction of the price (article body)
Simon Willison’s post, dated April 24, 2026. The headline is the verdict, but read the body: “I think this makes DeepSeek-V4-Pro the new largest open weights model. It’s larger than Kimi K2.6 (1.1T) and GLM-5.1 (754B) and more than twice the size of DeepSeek V3.2 (685B).”

Simon Willison’s verdict, written within hours of the launch: “DeepSeek V4 — almost on the frontier, a fraction of the price.” He is being polite. The actual situation is that DeepSeek has demonstrated, for the second year running, that the cost-to-capability curve Western labs are pricing against is a curve they themselves invented and that nobody outside their billing system is required to respect.

The moat is a puddle

For two years the consensus story in Western AI has been that frontier models are expensive because they are hard. That training compute is gated, post-training is a craft, alignment is a moat. There has been a steady, cheerful narrative in podcast appearances and S-1-adjacent investor decks about how the leaders extend their lead each year and how the followers — DeepSeek, Mistral, Qwen, Kimi — are perpetually a generation behind, useful as commodity infrastructure but not for “real” frontier work.

What April 24 did is end that narrative as a defensible position.

DeepSeek V4-Pro is, by their own admission, behind frontier — by three to six months. It is also free to download, runs on Huawei chips, integrates with Chinese cloud stacks at a fraction of the inference cost, and ships a 1M context window that Claude doesn’t yet match. A three-to-six month gap that you ship for free, in MIT-licensed weights, is not a moat. It’s a reminder that everyone involved in this race is running on the same physics.

The implication for Western labs is uncomfortable. Anthropic is currently raising $40B from Google. OpenAI is at $25B run-rate revenue, reportedly prepping a public listing. Both companies’ valuations are predicated on a moat that DeepSeek just open-sourced an 89×-cheaper version of. Either Western labs find a real differentiator — agent capability, computer use, reliability at horizon — or the price floor on inference falls until subscription pricing stops covering compute.

The Blunt takeaway

I am not saying DeepSeek V4 is the best model. It isn’t, on the metrics where Claude and GPT-5.5 still lead. I am saying that the gap between best and good-enough is now the difference between $25 and $0.28, and a lot of products will pick the cheaper number.

If you build with LLMs — agent tools, copilots, content workflows, internal automations — you have a homework assignment this weekend. Wire DeepSeek V4-Flash into one of your pipelines. Run a few thousand calls. See what breaks. The cost is rounding-error and the lesson, whichever way it goes, is worth more than the inference bill.

If you sit in a Western AI lab pricing meeting, you also have a homework assignment. Open the V4 model card on Hugging Face. Read the benchmarks. Then explain, out loud, to your boss, why your output token costs $25.

My rating on this drop: Shut up and try it. Not because V4 is going to replace your frontier model in production — it probably won’t. But because the price floor in the LLM market changed last week and pretending otherwise is the kind of complacency that gets your company priced into a corner.