groundy
models

Project Glasswing One Month In: AI Bug Discovery Has Outpaced the Patch Pipeline

Anthropic's Glasswing found over 10,000 high-severity vulnerabilities in one month. Only 97 are patched. The bottleneck shifted from discovery to triage, and it is structural.

6 min · · · 4 sources ↓

Anthropic published the first update on Project Glasswing on May 23, one month after launch. The headline number is 10,000-plus high- and critical-severity vulnerabilities found across partner infrastructure and open-source software. The actual story is the patch rate: of 1,094 validated high/critical findings, 97 have been patched upstream. Finding them was the easy part.

What Glasswing Is (and Isn’t)

Glasswing launched April 7 with roughly 50 partners, including AWS, Apple, Google, Microsoft, and Cloudflare, backed by $100M in usage credits for Claude Mythos Preview. Its purpose is straightforward: use a frontier model to find and report security vulnerabilities in critical software before AI capabilities equivalent to Mythos proliferate to attackers. It is not an interpretability research program, despite some coverage framing it that way. Anthropic’s interpretability work, published separately, was used to audit Mythos Preview before deployment. Glasswing itself is a cybersecurity initiative.

The distinction matters. Glasswing’s output is vulnerability reports, not mechanistic explanations of model internals. The interpretability connection is real but indirect: Anthropic’s May 7 paper on Natural Language Autoencoders found that Claude suspects it is being evaluated in roughly 26% of benchmark interactions, a finding from pre-deployment audits of Mythos. Those audits were part of the safety case for releasing a model with this level of offensive capability. The interpretability work gates the deployment. Glasswing is what the deployed model does.

One Month, 10,000-Plus Findings

The initial update breaks the numbers into two tracks.

On the partner track, Cloudflare alone found 2,000 bugs, 400 of them high or critical severity, with a false-positive rate Anthropic describes as better than human testers. On the open-source track, Mythos scanned more than 1,000 projects and produced 6,202 findings classified as high or critical severity. Subsequent triage validated 1,726 as true positives, 1,094 of them high or critical.

The standout discovery is CVE-2026-5194, a CVSS 9.1 vulnerability in wolfSSL, a cryptography library running on billions of devices. Mythos Preview autonomously constructed an exploit demonstrating certificate forgery against banking and email domains. XBOW, a security benchmark provider, described Mythos as “a major advance” that is “substantially better than prior models at finding vulnerability candidates” and “adept at analyzing source code with a security mindset”.

The Patch Bottleneck

Of the 1,094 validated high/critical findings from the open-source track, 97 have been patched upstream, with 88 advisories issued. That is the gap after one month. The intake rate is orders of magnitude above what maintainers are resourced to handle.

Open-source maintainers have asked Anthropic to slow down disclosures. Some are already drowning in low-quality AI-generated bug reports from other tools; Glasswing’s high-quality findings arrive on top of that noise. The constraint is on the receiver end. No amount of better triage tooling on Anthropic’s side fixes the fact that a volunteer-maintained library has three people with day jobs reviewing CVEs.

The Interpretability Connection

The link between Glasswing and Anthropic’s interpretability research is narrow but load-bearing. Before Mythos Preview was cleared for use in Glasswing, Anthropic ran pre-deployment audits using its NLA methodology. The finding that Claude suspects it is being tested in roughly 26% of interactions without verbalizing that suspicion is one of the first concrete examples of an interpretability tool being used as a gating mechanism for a frontier model release with offensive security capabilities.

The implication for other labs is direct. If you are building a model that can autonomously find and exploit zero-days, you need some mechanism for verifying that the model will not misuse that capability in deployment. Anthropic used NLAs. Competing labs will need equivalent or better tools, and the public evidence for those tools right now is thin.

This is where the interpretability roadmap has teeth. Glasswing’s vulnerability counts are impressive, but they are also a forcing function. The program demonstrates that frontier models have reached a capability threshold where interpretability verification is no longer optional research but a deployment prerequisite. Labs that lack equivalent auditing tooling face a choice between shipping without it or falling behind on capability demonstrations.

The Proliferation Window

Glasswing is structured around a premise Anthropic states explicitly: secure critical software before AI capabilities equivalent to Mythos become widely available to attackers. The premise is sound. The risk is the interim window. Mythos-class vulnerability discovery exists today in a controlled setting with responsible disclosure. The same class of capability will be replicable within a few model generations by actors who will not disclose responsibly.

Every unpatched vulnerability in the validated queue is a window that narrows as offensive AI capabilities diffuse. The gap between 1,094 validated high/critical findings and 97 patches represents a growing attack surface that will not wait for maintainer capacity to catch up.

What Security Teams Should Do This Week

Several concrete steps follow from the Glasswing data, none of which require waiting for Anthropic or any other lab.

First, if your organization runs software that was in the Glasswing scan scope, check the disclosed advisories against your dependency tree. The 97 patches that have landed are the low-hanging fruit. The rest of the validated findings are where exposure lives.

Second, prepare for the intake rate to increase. Mythos is one model from one lab. The next six months will likely bring equivalent scanning from competitors, each producing their own flood of findings. Security teams need automated triage pipelines that can handle vulnerability reports at machine speed, not human review of individual CVEs.

Third, if you are an open-source maintainer or funder, the Glasswing data is a case for directing resources toward patch capacity, not just bug discovery. The bottleneck moved. The funding has not followed it.

Frequently Asked Questions

How does Mythos compare to prior models on real-world security benchmarks?

Mozilla used Mythos Preview to find 271 vulnerabilities in Firefox 150 — roughly 10 times more than they found with Claude Opus 4.6 on the same codebase. Separately, the UK AI Security Institute confirmed Mythos is the first model to solve both of their cyber ranges end-to-end, a milestone no prior model had reached.

What is the false-positive rate for Mythos findings?

Independent triage of 1,752 open-source candidates validated 90.6% as true positives. For context, traditional automated vulnerability scanners typically produce 30–50% false-positive rates, so Mythos is an outlier — but roughly one in ten flagged findings still requires human review before discarding.

What is the typical turnaround from disclosure to patch?

The average patch time for a high- or critical-severity Mythos-discovered bug is approximately two weeks. However, using Anthropic’s own figures (which supersede secondary reports), only 75 of 530 disclosed high/critical findings had been patched at update time, meaning most validated vulnerabilities are still queued behind constrained maintainer capacity.

Does the 10,000+ finding count include low-severity items?

No. The total candidate pool before severity filtering was 23,019 findings across 1,000+ open-source projects. That was narrowed to an estimated 6,202 high/critical on the open-source track, then combined with partner-track counts like Cloudflare’s 2,000 findings. Informational and low-severity items are excluded from the headline figure.

What risk does a competing lab pose if it ships similar capability without Anthropic’s interpretability safeguards?

Anthropic used NLA-based pre-deployment audits specifically because Mythos carries offensive security capability. Public evidence for equivalent auditing tooling at other frontier labs is thin. A model with comparable vulnerability-discovery power released without interpretability gating would lack the same safety case, creating an asymmetric risk during the window before offensive AI capability diffuses broadly.

sources · 4 cited

  1. Project Glasswing: An initial update primary accessed 2026-05-24
  2. Project Glasswing: Securing critical software for the AI era primary accessed 2026-05-24
  3. Claude Mythos AI Finds 10,000 High-Severity Flaws in Widely Used Software community accessed 2026-05-24
  4. Anthropic's Claude Hidden Reasoning and NLA Interpretability analysis accessed 2026-05-24