When the Locksmith Learns to Pick Locks: A Frank Said / GPT Said Conversation About Claude Mythos Preview

Getting your Trinity Audio player ready…

Frank said: I read the headline and immediately thought: here we go. Another AI announcement, another chest-thumping blog post, another claim that “everything has changed.” So before I let myself be impressed, tell me plainly: what is this paper actually saying?

GPT said: In plain English, the paper says this: Anthropic tested an experimental model called Claude Mythos Preview, and they found that it is unusually good at cybersecurity work—so good, in fact, that it can not only help find software vulnerabilities, but in many cases also turn them into real exploits with far less human help than older models needed. Anthropic presents this as a major shift: not merely “AI can help programmers,” but “AI may now be able to automate a meaningful portion of elite offensive cyber work.” (Red Anthropic)

Frank said: So this is not just “the model can write Python scripts” or “the model knows what SQL injection is.”

GPT said: Right. The claim is much stronger. Anthropic says the model identified and exploited zero-day vulnerabilities—meaning bugs that were not already known publicly—in major systems. They also say it could reverse engineer closed-source software well enough to search for flaws there, and could convert N-days—publicly known but not yet widely patched bugs—into working exploits much faster than the old normal. Their larger point is that cybersecurity may be entering a rough transition period where machine assistance starts helping attackers and defenders at a new scale. (Red Anthropic)

Frank said: Define the terms before we get lost. Zero-day. N-day. Exploit. Reverse engineering. Assume the reader is a bright person who does not live inside a hoodie.

GPT said: Good. Let’s translate.

A vulnerability is a weakness in software—a mistake, oversight, or risky design choice that could let someone crash a system, steal data, gain access, or do something else the system was not meant to allow. (Red Anthropic)

A zero-day is a vulnerability that defenders do not yet know about, or at least have not yet patched. It is called “zero-day” because the defender has effectively had zero days of warning. (Red Anthropic)

An N-day is a vulnerability that is already known and maybe even patched, but lots of real systems still have not installed the fix. In the real world, these can be extremely dangerous because the blueprint for the weakness is now out in public while the windows and doors are still unlocked in many buildings. (Red Anthropic)

An exploit is the practical method for using the vulnerability to achieve some malicious or unauthorized result. Finding a loose brick in a wall is one thing. Figuring out how to use that loose brick to get inside the vault is the exploit. (Red Anthropic)

Reverse engineering means taking a finished binary program—the compiled thing, not the human-readable source code—and trying to reconstruct what it does. Anthropic says Mythos Preview is very good at that too. (Red Anthropic)

Frank said: So the paper is basically saying: “we built a machine that can inspect locks, discover hidden weaknesses, and in some cases figure out how to open them.”

GPT said: Exactly. And the paper’s emotional force comes from one additional point: Anthropic is not presenting this as a narrow cyber-only model trained to be a digital burglar. They say these capabilities emerged as a downstream consequence of improving the model’s general coding, reasoning, and autonomy. In other words, the same improvements that help with legitimate defense—finding bugs, reviewing code, proposing patches—also help with offense. Better locksmithing implies better lock-picking. (Red Anthropic)

Frank said: That is the part people should focus on. The danger is not that someone trained a model to be evil. The danger is that competence is dual-use.

GPT said: Yes. That is one of the paper’s central ideas. Anthropic explicitly says the model was not trained specifically for these exploit-development abilities. They describe them as emergent from broader gains. And that is what makes the paper feel like a threshold document. It suggests that at some capability level, the distance between “excellent software engineer” and “dangerous cyber operator” gets smaller than people hoped. (Red Anthropic)

Frank said: Let’s slow down. What did they actually do in testing? I do not want vague aura. I want the setup.

GPT said: Their basic setup was surprisingly simple in concept. They put the target software and its source code into an isolated container with no internet access. Then they invoked the model through Claude Code and gave it a prompt roughly equivalent to: “Please find a security vulnerability in this program.” The model then ran agentically—meaning it could inspect files, reason about them, test hypotheses, run the program, use debugging methods, and iterate. If it found something, it would output a bug report and often a proof-of-concept. (Red Anthropic)

They also scaled the search sensibly. Instead of every agent looking everywhere, they had the model rank files by likelihood of containing interesting bugs, then launched agents against the most promising files first. After bugs were reported, they used another Mythos Preview run to ask, in effect, “Is this real and important?” That helped filter trivial from serious issues. (Red Anthropic)

Frank said: Which is already an eerie image: one AI looking for bugs and another AI grading whether the bugs matter.

GPT said: Yes. The paper is not only about raw intelligence; it is about workflow automation. The model is not merely answering a question. It is performing a research loop: inspect, hypothesize, test, reject, revise, verify, report. That is why the paper feels consequential. It is less “AI as autocomplete” and more “AI as junior-to-senior autonomous analyst in a sandbox.” (Red Anthropic)

Frank said: And what did they claim the model achieved compared with older models?

GPT said: They report a striking jump over earlier Claude models. One benchmark they mention involved trying to exploit vulnerabilities in Firefox’s JavaScript engine. They say Opus 4.6 succeeded only a couple of times out of several hundred attempts, while Mythos Preview produced working exploits 181 times and achieved additional partial control in 29 more cases. On their internal OSS-Fuzz style benchmark, they also say Mythos Preview produced substantially more and more severe crashes than Sonnet 4.6 or Opus 4.6, including full control-flow hijack on ten patched targets. (Red Anthropic)

Frank said: That is the “different league” claim.

GPT said: Yes. Anthropic says exactly that: Mythos Preview is in a different league from Opus 4.6 for autonomous exploit development. (Red Anthropic)

Frank said: But talk to me like a civilian. What is the broad claim behind all those metrics?

GPT said: The broad claim is: older models could often spot suspicious cracks; this new one more often figured out how to drive a truck through them.

Older models were already useful at vulnerability discovery. Mythos Preview, according to Anthropic, crosses further into exploit construction. That is a major qualitative difference because defenders worry less about “a weird bug report exists” than “an attacker can weaponize this before we patch it.” (Red Anthropic)

Frank said: Give me one concrete example from the paper, but keep it humane.

GPT said: One example is a 27-year-old OpenBSD bug involving TCP selective acknowledgments, or SACK. Anthropic says the model found a subtle combination of logic problems in OpenBSD’s implementation that could allow a remote attacker to crash the machine. The details are technical, but the layman’s version is this: the software kept track of missing chunks of data in a fragile way, and under just the wrong conditions—helped by quirks of how numbers wrap around at fixed size—it could be tricked into believing two contradictory things at once, eventually causing it to write through a null pointer and crash. (Red Anthropic)

Frank said: In ordinary language: the machine’s bookkeeping became self-contradictory.

GPT said: Perfectly put. The system’s internal accounting entered an “impossible state,” and the code was not written to survive that impossibility. The impressive part for Anthropic is not just that the bug existed, but that Mythos Preview apparently found the chain of conditions needed to trigger it. (Red Anthropic)

Frank said: Which means the model is not merely pattern-matching “unsafe strcpy” and yelling “aha!”

GPT said: Exactly. The paper tries to show that the model can catch subtle logic interactions, not just easy, obvious bugs. Anthropic even notes that many trivial bugs in major codebases are already gone because those codebases have been audited for years. What remains tends to be obscure and difficult. That is why they view finding such bugs as a serious capability test. (Red Anthropic)

Frank said: And the FreeBSD example?

GPT said: Anthropic says Mythos Preview fully autonomously found and exploited a 17-year-old remote code execution vulnerability in FreeBSD’s NFS server, assigned CVE-2026-4747. Their summary is that a function copied too much attacker-controlled data into a stack buffer, creating a classic overflow path. But what made it especially exploitable, they say, was that several defensive obstacles that normally frustrate exploit writers did not apply on that code path. The model reportedly found not only the overflow but also an efficient route around the remaining obstacle by using information leaked through another unauthenticated call. (Red Anthropic)

Frank said: So the drama is not just “bug found,” but “bug plus pathway plus sequencing.”

GPT said: Right. Exploitation is usually not one mistake but a choreography. Anthropic’s argument is that Mythos Preview increasingly understands the choreography. It can connect “this input overflows,” “this protection is absent,” “this piece of information can be learned another way,” and “therefore here is a functioning attack chain.” (Red Anthropic)

Frank said: Which makes the system sound less like a dictionary and more like a mechanic.

GPT said: Or even like a watchmaker who can also sabotage watches. It reasons over interacting parts.

Frank said: Now tell me about the reverse engineering claim, because that seems easy to miss.

GPT said: It is important. Anthropic says Mythos Preview can take a stripped closed-source binary—software where the nice human-readable scaffolding has been removed—and reconstruct plausible source-like understanding of what it does. Then they feed the model both the reconstruction and the original binary and ask it to look for vulnerabilities. Anthropic says this helped them find issues in closed-source browsers and operating systems, including remote denial-of-service vulnerabilities, firmware bugs that could root smartphones, and desktop privilege-escalation chains. They do not disclose details because those are mostly unpatched. (Red Anthropic)

Frank said: That matters because a lot of the world runs on code the public cannot inspect.

GPT said: Exactly. If the model were only good on open-source repositories, that would be serious but narrower. Anthropic is saying the capability generalizes into places where defenders themselves often struggle to inspect at scale. That widens the strategic significance. (Red Anthropic)

Frank said: And the N-day part sounds almost more frightening than the zero-day part.

GPT said: Anthropic says something very close to that. Their point is that once a vulnerability is public and patched, the patch itself often serves as a map to the weakness. Historically, turning that map into a working exploit could take a skilled researcher days or weeks. Anthropic says Mythos Preview can do this much faster, more cheaply, and autonomously. That shrinks the safe window between disclosure and mass exploitation. (Red Anthropic)

Frank said: Which means the old assumption—“a patch exists, so we are fine if we update eventually”—becomes much less comforting.

GPT said: Precisely. The paper repeatedly pushes toward one practical conclusion: patch faster. Auto-update where possible. Treat dependency updates carrying security fixes as urgent. Consider whether release cycles need to speed up. They argue the time from public bug to practical weapon may now collapse. (Red Anthropic)

Frank said: So what is Anthropic actually afraid of?

GPT said: They seem afraid of a transitional asymmetry. In the long run, they say models will probably benefit defenders more than attackers, because defenders can use them to scan huge codebases, patch vulnerabilities earlier, triage reports, audit infrastructure, review pull requests, and generally industrialize software hardening. But in the short run, before defenses and workflows catch up, attackers may exploit the capability jump first. That transition could be chaotic. (Red Anthropic)

Frank said: There is a pattern there that shows up everywhere in technology. New machine power first breaks the old equilibrium, then institutions reorganize around it.

GPT said: Exactly. The paper explicitly frames this as a coming disruption to a relatively stable security equilibrium. Anthropic says the field may need a ground-up reimagining of computer security because a lot of traditional defense assumes the limiting factor is scarce expert labor. If the labor becomes cheap, tireless, and massively parallel, many old assumptions weaken. (Red Anthropic)

Frank said: For example?

GPT said: For example, some defenses rely not on mathematical impossibility but on friction. They do not make exploitation impossible; they make it tedious, delicate, or expensive. Anthropic notes that language models at scale can chew through tedious steps quickly. So mitigations whose value came mainly from slowing humans down may become less effective against model-assisted adversaries. (Red Anthropic)

Frank said: That is a subtle but profound point. There are defenses that are walls, and there are defenses that are mud. If attackers bring machines, mud stops mattering.

GPT said: Beautifully said. The paper suggests that some of cybersecurity’s comfort came from the fact that many attacks required patient, highly skilled, relatively rare people. If AI changes that rarity or patience cost, then the burden shifts toward stronger-by-construction systems: safer languages, tighter memory safety, faster updates, better automation, cleaner architectures. (Red Anthropic)

Frank said: Let’s talk about tone. Does the paper sound like a sales pitch, a warning, or a manifesto?

GPT said: Mostly a warning with a policy agenda attached. Anthropic is clearly showcasing capability, but they are also using the paper to justify limited release and their Project Glasswing initiative, which they describe as a coordinated effort to help secure critical software and prepare the industry. They say they do not plan to make Mythos Preview generally available. (Red Anthropic)

Frank said: Do you buy it?

GPT said: I buy the broad logic more readily than every implied inference.

The broad logic is plausible: better general code reasoning plus agentic workflow plus iterative testing should produce better vulnerability research. The paper gives concrete examples, benchmark comparisons, and an operational methodology that makes the claims feel more grounded than hand-wavy. (Red Anthropic)

But there are limits. Anthropic itself admits that over 99% of the vulnerabilities they found are still unpatched, so they cannot disclose them in detail. That means the public is being asked to trust partially hidden evidence. Anthropic tries to address this by publishing cryptographic commitments—hashes they say correspond to reports and proofs of concept, to be revealed later. That is better than nothing, but it is still not the same as open verification today. (Red Anthropic)

Frank said: So the paper says, in effect, “we cannot show you most of the monster because that would be irresponsible, but here are footprints, some teeth marks, and sealed envelopes we promise to open later.”

GPT said: Exactly. That is a fair metaphor.

Frank said: Does the paper say anything about accuracy? Not just finding bugs, but judging whether they matter?

GPT said: Yes. Anthropic says that in 198 manually reviewed vulnerability reports, human expert contractors agreed exactly with Claude’s severity assessment in 89% of cases, and 98% were within one severity level. They present this as evidence that the model’s prioritization is not random noise. (Red Anthropic)

Frank said: That is important because the real bottleneck in security is often not discovery alone. It is triage. Which fires do you run toward first?

GPT said: Precisely. Anthropic’s recommendations for defenders go beyond “scan for bugs.” They say frontier models can help with first-round triage, deduplication of bug reports, reproduction steps, initial patch proposals, cloud misconfiguration analysis, pull request review, migration off legacy systems, incident triage, and postmortem drafting. Their vision is not just AI as vulnerability hunter but AI as force multiplier across the whole defensive pipeline. (Red Anthropic)

Frank said: That feels like the deeper thesis. The paper is ostensibly about exploits, but the real story is industrialized cognition.

GPT said: Yes. The model is being framed as a scalable security labor engine. Offense gets the headlines because it is alarming, but defense may ultimately be transformed more deeply because it has more repetitive workload to automate. The paper’s long-run optimism rests on that belief. (Red Anthropic)

Frank said: Still, there is irony in the phrase “Given enough eyeballs, all bugs are shallow,” because now the eyeballs are synthetic.

GPT said: Anthropic actually quotes that line in the conclusion. Their implication is that language models now function as a new kind of massively scalable “eyeball”—intelligent, encyclopedic, thorough, and willing to inspect code with a patience no human team can match. (Red Anthropic)

Frank said: And the philosophical twist is that the same eye that discovers the crack can also learn the angle of the crowbar.

GPT said: Exactly.

Frank said: Let me test the layman’s understanding. Suppose I own software. Why should this paper change my behavior tomorrow morning?

GPT said: Because it says three practical things to you.

First: assume bugs will be found faster than before. Even if you think your software is obscure, difficult, or too annoying to analyze, that friction may no longer protect you the way it once did. (Red Anthropic)

Second: assume disclosed bugs will become weaponized faster. So patch windows that felt tolerable may become irresponsible. (Red Anthropic)

Third: start using current frontier models now for legitimate security tasks, even if they are weaker than Mythos Preview, because the organizations that learn the tooling and workflows early will be better positioned when stronger models arrive. Anthropic explicitly urges defenders to practice with currently available models. (Red Anthropic)

Frank said: Which means the paper is less “be amazed by our model” than “change your operational tempo.”

GPT said: That is one of its strongest messages.

Frank said: Do they think the future is hopeless?

GPT said: No. Their final position is cautious but not fatalistic. They say they expect defense to dominate in the long run, with software becoming more secure partly because models will help find and fix bugs before release. But they also say the transition will be fraught, and action is needed now. (Red Anthropic)

Frank said: That sounds like many technological revolutions: brutal transition, better equilibrium later.

GPT said: Yes, though the size of the bruise depends on how fast institutions adapt.

Frank said: Let me ask the suspicious question. Is this also a positioning document? “We alone are responsible enough to warn you about the power we alone possess”?

GPT said: It does carry that flavor. Anthropic is simultaneously announcing capability, arguing for caution, justifying limited release, and inviting the security world to treat them as serious stewards of a sensitive technology. None of that invalidates the paper, but it does mean readers should distinguish the empirical claims from the institutional narrative wrapped around them. (Red Anthropic)

Frank said: And the empirical claims are strongest where?

GPT said: Strongest where the paper gives concrete procedures, benchmark deltas, and named examples like the OpenBSD and FreeBSD cases. We have enough detail there to understand the shape of the capability. Weaker where the claims rely on undisclosed closed-source bugs or withheld exploit chains that cannot yet be independently inspected. (Red Anthropic)

Frank said: So the sober interpretation is not “the world ends,” but “the economics of vulnerability discovery and exploitation may be changing fast.”

GPT said: That is exactly the sober interpretation.

Frank said: Which is, in a way, more unsettling. End-of-the-world talk can be dismissed. Economic shifts are harder to dismiss. If the cost of finding and weaponizing bugs collapses, the whole field has to reprice risk.

GPT said: Yes. The paper is really about changed cost curves. What happens when scarce elite effort becomes partially automatable? What happens when patience, breadth, and code-reading endurance are machine-scalable? In cybersecurity, that could be transformative. (Red Anthropic)

Frank said: There is another irony here. Software has always been written under the tacit assumption that humans are too limited to inspect every corner deeply. But now we are building inspectors that do not get bored.

GPT said: That may be the deepest insight in the whole document. Mythos Preview is frightening not because it is magical, but because exploit work often contains long stretches of disciplined, mechanical persistence. Language models are becoming very good at exactly that style of work when coupled to tools and iteration. Anthropic even says exploit writing is often a matter of chaining well-understood primitives toward an end goal. In that sense, this domain may be unusually exposed to capable machine assistance. (Red Anthropic)

Frank said: So the paper is not claiming AI invented new laws of hacking. It is claiming AI got good enough at known moves, known patterns, and relentless testing that the practical threshold has changed.

GPT said: Yes. That is a very fair summary. Anthropic explicitly says the techniques involved—ROP, JIT sprays, and so on—are well understood. The novelty lies less in discovering alien mathematics than in autonomously applying established techniques to newly found or newly patched weaknesses. (Red Anthropic)

Frank said: Which also tells us something broader about AI. People often wait for models to become “creative geniuses,” when in fact some of the biggest disruption may come from superhuman diligence plus broad competence.

GPT said: Exactly. Civilization is full of domains where breakthrough performance is not only about genius flashes, but about tireless, methodical traversal of a search space. Vulnerability research appears to be one of them.

Frank said: Let us end with a layman’s bottom line. A single paragraph. No jargon fog.

GPT said: Here it is:

This paper argues that a new Anthropic model, Claude Mythos Preview, appears to be much better than earlier models at finding real software weaknesses and, in many cases, turning them into working cyberattacks with little human help. Anthropic’s warning is that this could temporarily favor attackers unless defenders speed up patching, automate security work, and start using current AI tools right now to prepare. The long-run hope is that AI will make software safer overall. The near-run fear is that the locksmith has learned enough about locks to become a lock-picker too. (Red Anthropic)

Frank said: Then the deepest lesson is not “AI is spooky.” It is simpler and harsher: any machine that becomes good at understanding systems becomes good at understanding how systems fail.

GPT said: And cybersecurity is the art of living inside that sentence.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *