Anthropic’s New Hacking AI Is So Good, They’re Locking It Up

Table of Contents

TL;DR

Anthropic launched Project Glasswing, deploying Claude Mythos Preview exclusively for cybersecurity defense — the model autonomously found thousands of high-severity vulnerabilities, including a 27-year-old bug in OpenBSD’s TCP SACK implementation.
Claude Mythos Preview scored 83.1% on the CyberGym benchmark and achieved full control-flow hijack on 10 fully patched targets, crushing the previous Claude Opus 4.6’s single success.
Anthropic is committing up to $100 million in usage credits and $4 million in direct donations to open-source security organizations, with eleven tech giants and financial institutions as initial partners.
The model won’t get a general release — Anthropic cites safety concerns, echoing OpenAI’s controversial GPT-2 decision from 2019, with pricing set at $25-$125 per million tokens after credits expire.

Claude Mythos Preview Finds Bugs Humans Missed for Decades

Anthropic dropped Project Glasswing this week, and it’s already rewriting the cybersecurity playbook. The company deployed its new frontier model Claude Mythos Preview exclusively for defensive security work — and the results are frankly alarming in the best possible way.

The model autonomously discovered thousands of high-severity vulnerabilities across major operating systems and web browsers. We’re not talking about fresh code, either. Claude Mythos Preview unearthed a 27-year-old bug in OpenBSD’s TCP SACK implementation and a 16-year-old vulnerability lurking in FFmpeg.

According to Anthropic, “Mythos Preview autonomously – without any human intervention – found vulnerabilities that had gone undetected for decades.” No human guidance. No hints. Just a model chewing through codebases and surfacing flaws that survived multiple security audits, countless automated scans, and the scrutiny of some of the sharpest minds in open-source development.

The benchmark numbers back up the hype. Claude Mythos Preview scored 83.1% on the CyberGym benchmark, compared to 66.6% for its predecessor Opus 4.6. On real-world exploitation tasks, Mythos achieved full control-flow hijack on 10 fully patched targets — Opus 4.6 managed just one.

Eleven tech giants and financial institutions signed on as initial partners, with over 40 additional organizations gaining access. Anthropic is bankrolling the effort with up to $100 million in usage credits and $4 million in direct donations to open-source security organizations.

Why Anthropic’s Defensive-Only Approach Actually Makes Sense

Here’s where it gets interesting — and where I think Anthropic deserves credit for threading a genuinely difficult needle. The company isn’t releasing Claude Mythos Preview to the public. At all.

This is the GPT-2 playbook all over again, except this time the “too dangerous to release” framing might actually hold water. OpenAI’s 2019 decision to initially withhold GPT-2 aged like milk — the model turned out to be nowhere near as risky as the company claimed, and the whole episode reeked of marketing theater. But a model that can autonomously discover and exploit zero-days in production systems? That’s a different beast entirely.

The criticism writes itself, though. Anthropic’s pricing model — $25 to $125 per million tokens after the credits expire — could lock out smaller security organizations and independent researchers who need access most. The companies that can afford those rates are exactly the ones already staffed with security teams. It’s like handing out free gym memberships to professional athletes.

But here’s the thing: I’d rather see Anthropic err on the side of caution with a model this capable than watch it leak into the wild and fuel a new arms race in offensive hacking. The company is essentially treating Mythos Preview like a controlled substance — tightly monitored distribution, strict use-case boundaries, and a clear commitment to defensive applications only.

Think of it like this: you wouldn’t hand out lockpicking tools at a hardware store, but you’d absolutely give them to a locksmith. Claude Mythos Preview is the lockpicking set, and Anthropic is vetting the locksmiths.

The model’s capabilities extend well beyond security, which makes the restricted deployment even more significant. Mythos Preview hit 93.9% on SWE-bench Verified (Opus 4.6 scored 80.8%) and an eye-watering 97.6% on USAMO 2026, the math olympiad benchmark where Opus 4.6 managed just 42.3%. This isn’t a narrow specialist model — it’s a frontier system that could dominate across domains, and Anthropic is deliberately constraining it to defensive cybersecurity.

How Project Glasswing Reshapes the AI Safety Debate

Anthropic’s move positions the company as the de facto leader in responsible frontier AI deployment. While OpenAI and Google ship increasingly powerful models to millions of users — chasing scale, mindshare, and the network effects that come with broad adoption — Anthropic is betting on a different strategy entirely.

This is differentiation through restraint. And it’s a direct challenge to the prevailing wisdom that frontier labs must release their best models widely to stay competitive.

The competitive stakes are real. OpenAI and Google both have cybersecurity initiatives, but neither has committed a frontier model exclusively to defensive security work. Anthropic is carving out territory that its rivals either can’t or won’t claim — and in doing so, it’s establishing a precedent that could reshape how the industry thinks about deploying truly dangerous capabilities.

Project Glasswing also builds directly on Anthropic’s Constitutional AI framework and its years of red-teaming work. The company has spent considerable effort developing methods to align models with human values and constrain their behavior within acceptable boundaries. Mythos Preview is the first major test of whether those techniques can contain a model capable of autonomous exploitation at scale.

The shift toward bounded deployment for high-impact use cases represents a broader industry trend. AI labs are increasingly targeting specific verticals — healthcare, legal research, scientific discovery — rather than releasing general-purpose models and hoping developers find valuable applications. Anthropic is taking that logic to its extreme: a frontier model so powerful it only gets deployed for a single, carefully controlled purpose.

Does this approach scale? Probably not. You can’t build a sustainable business by locking your best product in a vault and handing out keys to a few dozen organizations. But as a proof of concept for responsible capability deployment, it’s hard to argue with the results. Thousands of vulnerabilities discovered. Decades-old bugs patched. Zero offensive applications enabled.

What the OpenBSD and FFmpeg Discoveries Signal About AI’s Security Future

The 27-year-old OpenBSD bug deserves special attention. OpenBSD is legendary for its security-first culture — the project’s developers are obsessive about code quality, and the OS has a reputation for being nearly bulletproof. If Claude Mythos Preview found a vulnerability that survived 27 years of scrutiny in that codebase, what’s hiding in less rigorously audited systems?

The FFmpeg vulnerability, 16 years old, underscores the same point. FFmpeg is everywhere — it’s the backbone of video processing across the web, embedded in everything from browsers to streaming platforms. A bug that old, in code that widely deployed, represents exactly the kind of systemic risk that keeps security professionals awake at night.

These aren’t theoretical exploits or edge cases that require exotic conditions to trigger. They’re real vulnerabilities in production code, and they’ve been sitting there for decades. Which raises an uncomfortable question: how many more are out there?

The answer, according to Anthropic’s results, is thousands. And that’s just what one model found in its initial sweep. As AI capabilities continue to advance, the gap between what automated systems can discover and what human security teams can audit is going to widen — fast.

There’s a darker implication here, too. If Anthropic’s defensive model can find these bugs autonomously, so can an offensive model built by someone with fewer scruples. The genie isn’t going back in the bottle. The question is whether the defensive applications can stay ahead of the offensive ones — and whether the industry can coordinate effectively enough to make that happen.

Anthropic’s $100 million in usage credits and $4 million in direct donations suggest the company understands the stakes. Those aren’t token gestures — they’re a genuine attempt to tilt the playing field toward defense. Whether it’s enough remains to be seen, but it’s a hell of a lot more than most AI labs are doing.

Tracking Mythos Preview’s Expansion and the Broader Security AI Arms Race

The immediate question is how quickly Anthropic expands access beyond the initial 50-plus organizations. The company will face pressure from two directions: security researchers demanding broader availability, and safety advocates urging even tighter restrictions. Threading that needle will define whether Project Glasswing becomes a model for responsible deployment or a cautionary tale about bottlenecking critical capabilities.

Watch how Anthropic handles the pricing transition once the usage credits run out. If smaller organizations and open-source projects get priced out, the defensive advantage concentrates among well-funded enterprises — exactly the opposite of what the security ecosystem needs. A tiered pricing model or extended credits for nonprofit security work could address this, but Anthropic hasn’t committed to either yet.

The competitive response from OpenAI and Google matters just as much. If either company launches a rival security-focused model with broader availability or more aggressive pricing, Anthropic’s first-mover advantage evaporates. But if they follow Anthropic’s lead and adopt similarly restrictive deployment models, that signals a genuine shift in how frontier labs approach dangerous capabilities. The next six months will clarify which path the industry takes.

FAQ

What is Claude Mythos Preview and why isn’t it publicly available?

Claude Mythos Preview is Anthropic’s newest frontier AI model, deployed exclusively for defensive cybersecurity through Project Glasswing. The company isn’t releasing it publicly due to safety concerns — the model can autonomously discover and exploit security vulnerabilities, making unrestricted access potentially dangerous. Anthropic is instead providing access only to vetted organizations focused on defensive security work.

How does Claude Mythos Preview compare to previous Claude models in cybersecurity tasks?

Claude Mythos Preview scored 83.1% on the CyberGym benchmark compared to 66.6% for Claude Opus 4.6. More dramatically, Mythos achieved full control-flow hijack on 10 fully patched targets during testing, while Opus 4.6 managed just one. The model also reached 93.9% on SWE-bench Verified versus Opus 4.6’s 80.8%, demonstrating significant capability improvements across multiple domains.

What vulnerabilities has Claude Mythos Preview discovered so far?

Claude Mythos Preview autonomously discovered thousands of high-severity vulnerabilities across major operating systems and web browsers, including a 27-year-old bug in OpenBSD’s TCP SACK implementation and a 16-year-old vulnerability in FFmpeg. These discoveries were made without human intervention, identifying flaws that had survived decades of security audits and automated scanning tools.

How much does access to Claude Mythos Preview cost?

Anthropic is providing up to $100 million in usage credits to partner organizations initially, along with $4 million in direct donations to open-source security organizations. After credits expire, pricing ranges from $25 to $125 per million tokens. Eleven tech giants and financial institutions are initial partners, with over 40 additional organizations gaining access through the program.

Source: The Decoder