The Paradigm Shift
For most of the last twenty years, finding a critical vulnerability in serious software was a job for a small number of very stubborn people. Months of reading source. Months of fuzzing. The occasional flash of insight that turned a memory corruption bug into something that actually pops a shell. The reason zero-days felt rare wasn’t that the bugs weren’t there. It was that the people who could find them were rare.
That constraint is gone.
Anthropic’s Project Glasswing and its Claude Mythos Preview model have made one thing very clear: AI reasoning has reached the level of an experienced security researcher. The model fully autonomously discovered and exploited a 17-year-old remote code execution bug in FreeBSD (CVE-2026-4747). It found a 27-year-old vulnerability in OpenBSD, a system whose entire reputation is built on not having those. It hit 72.4% on autonomous exploit development in the Firefox JavaScript shell, against effectively 0% from the previous frontier model six months earlier [source].
Right now, this capability is sitting inside a handful of top tech companies, pointed defensively at the software our daily lives depend on. That won’t be the case for long. Anthropic isn’t the only one. OpenAI launched Daybreak this week. Microsoft announced MDASH. The controlled window is closing faster than anyone expected.
We are heading into a surge in CVE disclosures unlike anything cybersecurity has seen. The first wave is going to hit when the Glasswing report drops in July 2026 and the patched bugs start getting disclosed publicly. The bigger waves come later, when Mythos-class capabilities become widely accessible, and security-researcher-grade discovery ends up in anyone’s hands, including the ones we don’t want it in.
This post is about what those waves are going to look like, why our remediation infrastructure isn’t ready, and what we’d actually do about it if we were starting today.
Two Eras of Security Research
The Manual and AI-Assisted Era (Security Research as a Craft)
Until recently, three things bottlenecked vulnerability discovery, and through it, the entire shape of the threat landscape:
Expertise. Finding a critical exploit in something like a kernel or a browser engine took elite knowledge that maybe a few thousand people in the world had. AI assistants helped researchers go faster over the last three years, but the human was still the engine. The AI was a co-pilot.
Time. Even for the best researchers, deep work on a single target was measured in weeks and months. You couldn’t run ten of yourself in parallel.
Capital. Hiring those researchers, or buying their findings on the offensive market, was expensive. That cost set a natural ceiling on how many bugs got weaponized.
The result was a world where critical CVEs were periodic events. Not rare, exactly, Heartbleed, Shellshock, Spectre, Log4Shel, but spaced out enough that the industry could mostly mobilize around each one. We had time to breathe between patches.
And even then, plenty of organizations stayed vulnerable for years after fixes shipped, because patching at scale is genuinely hard. Hold that thought. It’s about to matter a lot more.
The Agentic Era (Security Research as an Autonomous Process)
Now compare that to where we are today.
AI models can autonomously read a codebase, reason about its security properties, identify vulnerabilities, chain multiple bugs into a working exploit, and produce proof-of-concept code, all without a human in the loop. That last sentence used to describe the top 1% of the top 1% of researchers. It now describes a model you can rent by the token.
The constraints have inverted:
- No deep human expertise required.
- No multi-month timelines required.
- Minimal capital required.
In the AI-assisted era, more software started getting written, faster, by smaller teams. In the agentic era, that same software is also being read, analyzed, and probed for vulnerabilities, faster, by smaller teams. Both curves are accelerating at the same time.
And now AI can generate malware at machine speed, tailored precisely to the vulnerabilities it uncovers.
This is the new geometry. Discovery is no longer the bottleneck. Remediation is.
The First Wave: The Glasswing Disclosure (July 2026)
Project Glasswing is a coalition of major industry players, AWS, Apple, Broadcom, Cisco, Google, JPMorgan Chase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks, CrowdStrike, and Anthropic, given controlled access to Mythos Preview to find and fix bugs in their own software before adversaries can.
Anthropic has committed $100M in model usage credits to back the work. Per Anthropic’s own writeup, Mythos Preview has already surfaced thousands of high-severity vulnerabilities, including ones in every major operating system and every major web browser.
When the public summary report drops around July 2026 and the bugs that are patched start getting disclosed, that disclosure stream is going to be the largest single batch of legitimate critical CVEs the industry has ever processed at once. And that’s before accounting for what’s running in parallel. This week, OpenAI launched Daybreak and Microsoft announced MDASH, both AI-native vulnerability discovery programs actively surfacing bugs across their own massive codebases. July’s disclosure queue isn’t one program’s output anymore. It’s three converging streams, all landing on the same remediation infrastructure at once. If you operate any of the affected platforms, and given the scope, almost everyone does, you should already be modeling July as a patching event. Not a security team problem. A business continuity problem.
The Bigger Waves Ahead
Glasswing is the controlled release. The waves after it won’t be controlled.
Mythos itself is a gated research preview. But other labs are building toward similar capability, and the cost of using the resulting models will keep falling. As Mythos-class reasoning becomes broadly available, through commercial APIs, fine-tuned open-weight models, or just successor frontier releases, the rate of disclosure is going to outpace any process built on humans coordinating with humans.
Call it the “Log4Shell every week” reality. It’s worth sitting with what that actually means day to day.
As our partners at Cloudflare recently noted, some security teams are now operating under two-hour SLAs from CVE release to patch in production. The instinct is understandable when the attacker timeline shortens, the defender timeline has to shorten with it, but faster alone isn’t going to cut it, and a lot of teams are about to learn that the hard way.
When Log4Shell hit, most security teams lost two to four weeks. Patching, dependency audits, customer questions, board briefings, the whole cascade. One CVE. Now picture that as a normal week.
That week is this week. As this post was being written, OpenAI launched Daybreak and Microsoft announced MDASH, both AI-native vulnerability discovery platforms now actively scanning codebases at scale. Three major AI programs converging on the same problem, in the same month. The flood isn’t a forecast anymore. It started.
The math doesn’t favor defenders, and it especially doesn’t favor open source. Most critical OSS projects are run by a handful of people, often unpaid, often nights and weekends. Their code is public. Which means an attacker can point a Mythos-class agent at it and run hundreds of parallel research sessions while the maintainer sleeps. The attacker has the compute. The maintainer has a day job.
The disclosure norms we built over the last two decades, researcher emails maintainer, maintainer ships patch, everyone coordinates timing, only worked because finding bugs was hard. That assumption is breaking.
These fundamental shifts in discovery, weaponization, attack, response, and repeat are not just operational imperatives of our vulnerability management teams, they are foundational changes for our incident handlers, DevSecOps, Resiliency Teams, and of course our partners in IT and Product Delivery.
How to Protect Against the CVE Floods
CVE floods are going to hit every layer of the stack: firewalls, operating systems, web frameworks, browsers, ML libraries, and the ocean of open source packages underneath all of it. As Mythos-class models become more accessible, any organization with a slow remediation process is going to be exposed in a way they haven’t been before.
Three things matter more than they did a year ago. None of them are new ideas. All of them have just become non-optional.
1. Stress-Test Your Mean Time to Remediate
You need to know, actually know, not estimate from a slide, how fast your organization can move when ten critical CVEs hit your stack at once. Not one. Ten. Simultaneously. With overlapping dependencies and no clean prioritization order.
Take a page from chaos engineering and run a critical CVE flood drill. Pick a quarterly cadence. The rules look something like this:
- Pick ten critical vulnerabilities across your codebase, dependencies, and runtime infrastructure.
- Give the team 24 hours to triage all ten, identify which assets are actually exposed, draft fixes or mitigations, and get the highest-risk ones into production.
- Measure where the friction was: was it identification, prioritization, ownership, code review, deploy windows, customer comms?
The first time you run this, it will go badly. That’s the point. The drill exists so the friction shows up in a Tuesday afternoon exercise, not on the day a real Glasswing-scale disclosure lands.
2. Prioritize What Matters Most (For Real This Time)
Most vulnerability programs still prioritize by CVSS score. CVSS tells you how bad a bug could be in the abstract, not whether it can hurt you, given what you actually have deployed and how it’s exposed.
That mismatch was tolerable when the queue was hundreds of findings a quarter. It will not survive a queue of thousands.
The teams that will hold up under the flood are the ones whose prioritization is driven by real context:
- Runtime signal. Is this code path actually executed in production? Is the vulnerable function reachable from an external entry point?
- Exposure. Is the asset internet-facing? Behind which controls? Reachable from which network segments?
- Environmental fit. Are the preconditions for exploitation present in your environment, or only in the textbook?
- Business context. Is this in a system handling customer data, money, or admin authority, or in a contained sandbox?
The goal isn’t to fix less. The goal is to fix the right things first, fast, while the long tail gets worked through over weeks instead of a panicked weekend.
3. Shift to AI: Build a Remediation Pipeline That Can Run Without You & AI Enable Your Resilience Teams & Incident Responders
Calendar-speed remediation can’t beat machine-speed discovery. That’s a structural fact, not a productivity problem.
Microsoft’s VP of Agentic Security said it plainly this week:“The durable advantage lies in the agentic system around the model, not in any single model itself.” Discovery is being commoditized. The organizations that win are the ones who built the system around it.
For years the answer was shift left, push security work onto developers. That bought us something, but it didn’t change the math: humans were still in every loop. The Shift to AI does. An agent takes a triaged, prioritized finding and turns it into a draft fix, an identified owner, a PR, and a deploy plan, without a human pulling each step forward by hand. The human stays in the loop where it matters: is this exploitable in our environment? Is this fix safe to ship? They stop being the bottleneck on coordination.
If your remediation flow today depends on a security engineer manually opening a ticket, hunting down the code owner, writing the description, waiting on Slack, and re-pinging on Friday, that flow isn’t going to survive a CVE flood. Build it like you mean it now, while the volume is still sane enough to test against.
What This All Comes Back To
The thing to understand about Glasswing is that it didn’t change what’s possible. The bugs were always there. OpenBSD didn’t get less secure overnight; it just got read by something that could see what 27 years of human auditors couldn’t.
What changed is who can do the looking, and how fast.
We don’t get to opt out of that shift. The waves are coming whether or not our processes are ready. The only question on the table is whether your organization spends the next year building the muscle to absorb them, running the drills, tightening the prioritization, automating the boring parts of remediation, or whether you find out how unprepared you are at the same time as everyone else, with the clock running.
Pick the first one. The window to do it on calendar speed is closing.
