- Moving from Opus to a Capybara-tier model is framed as a step-change, because recursive self-correction could eliminate common failure modes that currently require constant human supervision.
- Opus 4.6 tends to break down on long, dependency-heavy, multi-step work (e.g., large refactors or end-to-end test generation), making outcomes feel like coin flips rather than reliable workflows.
- If self-correction works as described, success rates on complex tasks could jump from ~60% to ~95%, turning “experiments” into processes you can actually trust.
- Higher pricing (2–3x Opus) is expected to create a sharp market split: premium models reserved for high-value, high-risk tasks while cheaper tiers handle everything else.
- The article flags a governance gap: ASL-4 isn’t defined even as models with potentially ASL-4-level cyber capabilities may already exist, raising safety and policy urgency.
Three things you'll walk away with after reading this:
- The jump from Opus to Capybara isn't incremental. Recursive self-correction changes what you can trust a model to do without babysitting it.
- The pricing creates a pricing gap. At 2-3x the cost of Opus, Capybara splits the market: tasks worth paying premium for, and everything else. That split will reshape how companies staff AI work.
- Anthropic's safety framework has a gap it hasn't closed. ASL-4 isn't defined. The model that might require it already exists. That's not a technicality.
Anyone who has spent real time building with Opus 4.6 knows where it cracks. Not in theory, not on benchmarks, but in the middle of a complex refactor when the model confidently rewrites a dependency chain that worked perfectly fine and breaks three services in the process. Those failure modes are predictable enough that experienced users route around them automatically. The interesting question about Mythos isn't whether it scores higher on evaluations. It's whether it eliminates the failure patterns that force you to babysit the model through every non-trivial task.
The ceiling you can feel
Opus 4.6 handles multi-file codebases, reasons through long documents, and powers agentic workflows that mostly work. But "mostly" is load-bearing. The model degrades on tasks requiring 15-20 sequential steps where each step depends on adapting to new information from the previous one. It makes architectural calls with high confidence and low accuracy on large codebases because it can't hold the full dependency graph. And it struggles to backtrack. When correcting course means admitting an earlier decision was wrong, Opus tends to patch around the mistake rather than rethink the approach. Try a 30-file refactor that preserves existing behavior. Or generating a complete integration test suite from API documentation alone. These succeed roughly 60% of the time. That 40% failure rate makes them experiments, not workflows. You can't build a reliable process on coin-flip reliability. Anthropic's leaked documentation describes recursive self-correction: the model spots its own errors and fixes them without waiting for a human to intervene. If that capability works as advertised, those 60% tasks climb toward 95%. The distance between 60 and 95 is the distance between a tool you try and a tool you trust.
What changes when the model catches its own mistakes
The actual cognitive cost of working with agentic AI isn't the work itself. It's monitoring the model while it works. You check after every significant step. You redirect when it veers off track. You catch the wrong decisions it makes with complete certainty. A model that identifies and corrects its own errors transforms that supervision dynamic. Autonomous workflows can run longer. The exhausting cycle of "fix this, now fix what you just broke, now revert to the version before that," which anyone using Claude Code has endured for entire afternoons, gets shorter. The supervision distance stretches. In cybersecurity, this shift becomes tangible and concerning in a specific way. A model that can consistently chain vulnerability discovery with exploit creation and lateral movement across network segments is categorically different from one that occasionally handles fragments of that process. The leaked draft's description of Mythos being "far ahead of any other AI model in cyber capabilities" suggests the consistency threshold has been crossed. The issue isn't what the model can do on its best run. It's what it can do on every run.
A split market, not a gradient
Capybara-tier pricing won't be gentle. Opus 4.6 runs $15 per million input tokens. Early projections place Capybara between $30 and $45, with output tokens potentially steeper. That pricing structure creates a sharp division rather than a smooth spectrum. A principal engineer evaluating architectural trade-offs, or a security researcher conducting vulnerability analysis, will absorb the cost without hesitation because the capability gap justifies it. But the overwhelming majority of tasks where Opus or Sonnet performs adequately won't migrate upward. Paying triple for capacity you don't use is waste, not investment. The more revealing pattern is which roles benefit most. Capybara excels at senior-level work. An experienced engineer billing $200 per hour still comes out ahead when the model compresses two hours of work into ten minutes, even at $40 per million tokens. A junior developer handling routine implementation? The economics collapse. That dynamic has workforce implications. Organizations will likely concentrate investment in fewer, more experienced practitioners who know how to direct expensive models effectively, rather than expanding teams that use cheaper ones. Capybara amplifies people who are already at the top of the capability ladder. It doesn't extend the ladder downward.
The missing safety definition
Anthropic's Responsible Scaling Policy categorizes models by catastrophic misuse potential. ASL-3 applies to models that "substantially increase" risks in cybersecurity, biology, or radiological domains. ASL-4 covers models that become a primary source of national security risk. The leaked draft places Mythos against the upper boundary of ASL-3. Whether it crosses into ASL-4 territory is a question Anthropic hasn't addressed publicly. The complication: ASL-4 doesn't have a definition yet. Anthropic committed to establishing one before any model triggered ASL-3, but the updated RSP from May 2025 contained no ASL-4 specification, even though the company was already classifying Opus 4.6 as ASL-3. Now a model that may require ASL-4 classification exists, and the public framework for evaluating it does not. Triggering ASL-4 would mean more rigorous safety evaluation and would publicly acknowledge how far ahead Anthropic believes it is. Both are outcomes a company might prefer to postpone. Private briefings to government officials suggest the internal assessment is serious. The public documentation tells a different story. The EA Forum flagged this tension explicitly, arguing that Anthropic is "quietly backpedalling on its safety commitments." Regardless of whether that characterization is fair, the perception problem is real. You don't want to write the rules after the thing that might break them already exists.
A capability-class problem
Coverage has framed Mythos as an Anthropic story. It's really a frontier-capability story. If Anthropic has reached Capybara-tier, OpenAI and Google are building toward it. The capability isn't unique to one architecture or one training run. It emerges from scale, data, and compute that multiple labs can access. The cybersecurity risks Anthropic is showing aren't specific to Mythos. They're what happens when any model reaches this performance class. By warning about its own model, Anthropic is effectively warning about every model that arrives at this level. OpenAI's next generation, Google's next Gemini, any sufficiently resourced competitor will face identical questions. The difference is that Anthropic committed the warning to paper (accidentally) while the others haven't. Check Point's threat analysis framed it as an "AI attack factory" where threat actors scan systems continuously and generate novel attack vectors at scale. That factory isn't an Anthropic product. It's a capability threshold. Once one lab crosses it, the countdown begins for everyone else.
Practical implications for builders
For coding work, anticipate a model that handles genuine architectural reasoning rather than just function-level generation. Multi-file refactors that currently require you to maintain the plan and guide each step could become single-prompt operations. For agent-based workflows, the trust perimeter expands. Current best practice involves tight human-in-the-loop checkpoints. A model that catches its own errors enables longer autonomous runs before human review becomes necessary. For defensive security, early access matters. Anthropic's restricted-access program operates on the assumption that the window between defender access and adversary access is narrow. For most practitioners, the near-term impact is indirect. Capybara will remain expensive and access-limited for months. But it establishes the performance floor for the next generation of Opus and Sonnet at lower price points. Within roughly a year, mid-tier models absorb some of these capabilities. That pattern has held through every previous generation. What costs $40 per million tokens today will cost $15 tomorrow. The governance question doesn't have a tidy resolution yet. Anthropic is briefing governments privately and releasing to defenders first, which represents more transparency than most labs would offer. Whether it's sufficient depends on how quickly competitors reach equivalent capability levels, and nobody outside those organizations knows the timeline.
Treat this as a prompt to audit where your team is spending time “babysitting” Opus—those are the workflows most likely to benefit if recursive self-correction is real. Start separating tasks into “must be reliable” (large refactors, security-sensitive analysis, long agent runs) versus “good enough” work so you can justify when premium pricing is worth it. And if you operate in security or ship agentic tooling, watch closely for clearer ASL-4 definitions and be ready to tighten internal controls as model consistency improves.