- Top-tier models like Anthropic’s Mythos can outperform public models significantly, but the limiting factor is often inference compute economics, not just safety or artificial scarcity.
- AI labs effectively split their compute budgets between training future models and serving current users, creating a zero-sum tradeoff between revenue today and capability tomorrow.
- Because capacity is bought and planned far in advance, demand forecasting becomes a high-stakes problem: overbuy wastes cash, underbuy starves research and future competitiveness.
- The most transformative AI use cases typically require long, expensive inference (more reasoning steps, verification, exploration), which makes them hard to offer at consumer scale.
- As labs prioritize workloads with the best ROI, many impressive demo-level capabilities remain gated or unavailable broadly until inference costs drop or pricing changes.

Anthropic's Mythos scores 15 points higher than Opus on coding benchmarks. It exists. Only a handful of organizations can access it through Project Glasswing. The reason isn't safety theater or artificial scarcity. Anthropic literally can't afford to serve it at a price that makes sense.
The model is real. The compute budget to let you use it isn't. Understanding why changes how you think about every AI product decision you make.
The 50/50 split
Dario Amodei laid this out in a conversation with Dwarkesh Patel: take all the compute an AI company has bought. Roughly half goes to training, building the next model, running research experiments, pushing the capability frontier forward. The other half goes to inference, serving users when they ask Claude a question, generate code, or analyze a document.
Inference makes the money. Training makes the future. Both compete for the same pool of chips.
The constraint gets worse with timing. Anthropic buys data center capacity a year in advance. Overestimate customer demand and inference capacity sits idle, burning cash with no revenue. Underestimate and you're profitable today but you've starved the research budget that produces next year's models.
Dario called it a "hellish demand prediction problem." Billion-dollar allocation decisions today that constrain what's possible twelve months from now, in a market where demand patterns change quarterly.
Why Mythos is gated
The public narrative focused on safety: Mythos has dangerous cyber capabilities, so Anthropic is being responsible by restricting access. That's partially true. The cyber capabilities are real. The preparedness framework is real.
But the compute economics are the bigger constraint. Anthropic's own language is that Mythos is "very expensive to serve." Serving Mythos at consumer scale, at any price point that makes commercial sense, would consume inference compute Anthropic needs for everything else. The 50/50 split made tangible: the model exists, the budget to let a billion people use it doesn't.
OpenAI faced the same constraint more visibly. They killed Sora entirely. Roughly $1 million per day in compute, fewer than 500,000 users. They looked at a fixed pool of chips and decided: video generation for consumers loses. Coding agents for developers wins.
Every lab will face this decision. Most already have, quietly. The workloads that survive the cut are the ones the labs believe produce the highest return, in revenue, strategic positioning, or training signal for future models.
The disconnect you can feel
There's a gap between what AI can do in a demo and what AI does in your daily workflow. Right now, a huge share of inference compute powers customer service bots, content generation, and coding assistance. Work that matters but that sits in a particular band of complexity. The capabilities that could genuinely change things, finding critical security vulnerabilities, running multi-step scientific reasoning, doing research that produces novel insights, those require the expensive models, the long inference chains, the reasoning that burns 10x or 100x more compute per query.
Those are exactly the workloads the economics don't support at scale yet. This is the disconnect the PwC study quantified: 74% of AI's economic value flowing to 20% of companies. Part of that gap is organizational. But part of it is structural: the highest-value applications of AI are the ones that cost the most to run, which means they're available to the fewest users.
Smarter costs more, not less
The next wave of capability comes from reinforcement learning and inference-time compute, letting models think longer on hard problems. Both cost more per query. Not less. More.
A base model that generates text is cheap to serve. A model that reasons through a chain of thought, checks its work, tries alternative approaches, and synthesizes a conclusion burns significantly more compute per question. The smarter the model, the more expensive each interaction.
This inverts the expectation most people have about technology. The hardware gets cheaper per FLOP. But the workloads that matter are growing in compute demand faster than hardware prices are falling. The net cost of the queries you actually care about is going up. The models that get cheaper are the ones from two generations ago, which is fine for many use cases but won't help if you need frontier capability.
What this means
The model you're using is not the best model that exists. It's the best model your provider can afford to serve you at your price point. The gap between what exists and what you have access to is real and growing.
Watch what gets killed or gated. Sora's shutdown, Mythos's restriction, these are signals about where compute is flowing. If your use case is "nice to have but compute-intensive," it might be the next Sora.
The ceiling on your AI experience isn't the model's capability. It's the economics of serving that capability to you.
When you evaluate AI tools, assume the best capability may exist but be inaccessible or throttled because it’s too expensive to serve at scale. Plan for product and workflow choices that are robust to model availability—e.g., design fallbacks, budget for premium tiers, and prioritize tasks where cheaper models already deliver reliable value. Watch for signals like gating, usage caps, or sudden product shutdowns as indicators of compute reallocation, and treat “reasoning” features as a cost driver that may affect pricing and reliability over time.