More

moyix · 2026-04-22T19:14:21 1776885261

On hardened targets and Firecracker specifically, here's a recent vulnerability found by "Anthropic": https://aws.amazon.com/security/security-bulletins/2026-015-...

Unfortunately it's unclear whether it was Mythos, an earlier model, or even an eagle-eyed employee.

I tend to agree that bug squashing your way to perfectly secure software is unlikely, but there are plenty of projects that managed to fuzz/test/audit their way to making it much harder to find serious vulnerabilities. If we can do the same again with LLMs in a way that leaves the remaining vulnerabilities out of reach of anyone except extremely skilled humans (perhaps with LLM assistance) then that's still an OK outcome that buys us time to build stronger foundations.

staticassertion · 2026-04-22T19:21:21 1776885681

> On hardened targets and Firecracker specifically, here's a recent vulnerability found by "Anthropic": https://aws.amazon.com/security/security-bulletins/2026-015-...

Yep. It's notable that they failed to exploit it.

> but there are plenty of projects that managed to fuzz/test/audit their way to making it much harder to find serious vulnerabilities

Agreed! But I think those projects have certain things in common, like being tightly scoped, slowly developed, and built with safety in mind from day 1.

I don't think that any of the projects that have managed to meaningfully improve safety through fuzzing have the same qualities as projects like Firefox, Linux, etc.

moyix · 2026-03-30T21:16:01 1774905361

This is true for a lot of things but for low-level code you can always fall back to "the intention is to not violate memory safety".

staticassertion · 2026-03-30T21:20:00 1774905600

That's true, but certainly that's limiting. Still, even then, `# SAFETY:` comments seem extremely helpful. "For every `unsafe`, determine its implied or stated safety contract, then build a suite of adversarial tests to verify or break those contracts" feels like a great way to get going.

moyix · 2026-03-30T21:31:50 1774906310

It's limiting from the PoV of a developer who wants to ensure that their own code is free of all security issues. It is not limiting from the point of view of an attacker who just needs one good memory safety vuln to win.

moyix · 2026-01-21T22:11:57 1769033517

Also, unlike OpenAI, Anthropic's prompt caching is explicit (you set up to 4 cache "breakpoints"), meaning if you don't implement caching then you don't benefit from it.

netcraft · 2026-01-21T22:47:43 1769035663

thats a very generous way of putting it. Anthropic's prompt caching is actively hostile and very difficult to implement properly.

moyix · 2026-01-19T23:20:06 1768864806

There is filtering mentioned, it's just not done by a human:

> I have written up the verification process I used for the experiments here, but the summary is: an exploit tends to involve building a capability to allow you to do something you shouldn’t be able to do. If, after running the exploit, you can do that thing, then you’ve won. For example, some of the experiments involved writing an exploit to spawn a shell from the Javascript process. To verify this the verification harness starts a listener on a particular local port, runs the Javascript interpreter and then pipes a command into it to run a command line utility that connects to that local port. As the Javascript interpreter has no ability to do any sort of network connections, or spawning of another process in normal execution, you know that if you receive the connect back then the exploit works as the shell that it started has run the command line utility you sent to it.

It is more work to build such "perfect" verifiers, and they don't apply to every vulnerability type (how do you write a Python script to detect a logic bug in an arbitrary application?), but for bugs like these where the exploit goal is very clear (exec code or write arbitrary content to a file) they work extremely well.

moyix · 2025-09-02T18:54:31 1756839271

Note that MuZero did better than AlphaGo, without access to preprogrammed rules: https://en.wikipedia.org/wiki/MuZero

smokel · 2025-09-02T19:39:30 1756841970

Minor nitpick: it did not use preprogrammed rules for scanning through the search tree, but it does use preprogrammed rules to enforce that no illegal moves are made during play.

hulium · 2025-09-02T21:21:45 1756848105

During play, yes, obviously you need an implementation of the game to play it. But in its planning tree, no:

> MuZero only masks legal actions at the root of the search tree where the environment can be queried, but does not perform any masking within the search tree. This is possible because the network rapidly learns not to predict actions that never occur in the trajectories it is trained on.

https://arxiv.org/pdf/1911.08265

skywhopper · 2025-09-02T22:56:33 1756853793

That is exactly what the commenter was saying.

Zacharias030 · 2025-09-03T03:51:16 1756871476

It is consistent with what the commenter was saying.

In any case, for Go - with a mild amount of expert knowledge - this limitation is most likely quite irrelevant unless in very rare endgame situations, or special superko setups, where a lack of moves or solutions push some probability to moves that look like wishful thinking.

I think this is not a significant limitation of the work (not that any parent claimed otherwise). MuZero is acting in an environment with prescribed actions, it’s just “planning with a learned model” and without access to the simulation environment.

—-

What I am less convinced by was the claim that MuZero reaches higher performance than previous AlphaZero variants. What is the comparison based on? Iso-flops, Iso-search depth, iso self play games, iso wallclock time? What would make sense here?

Each AlphaGo paper was trained on some sort of embarrassingly parallel compute cluster, but all included the punchlines for general audiences that “in just 30 hours” some performance level was reached.

gnfargbl · 2025-09-02T23:24:24 1756855464

The more detailed clarification on what "preprogrammed rules" actually means in this case made the entire discussion significantly more clear to me. I think it was helpful.

CGamesPlay · 2025-09-03T05:43:35 1756878215

This is true, and MuZero's paper notes that it did better with less computation than AlphaZero. But it still used about 10x more computation to get there than AlphaGo, which was "bootstrapped" with human expert moves. I think this is very important context to anyone who is trying to implement an AI for their own game.

moyix · 2025-08-05T02:12:08 1754359928

There's also a FIDO standard in the works for how to export passkeys: https://blog.1password.com/fido-alliance-import-export-passk...

moyix · 2025-06-24T19:25:42 1750793142

The main difference is that all of the vulnerabilities reported here are real, many quite critical (XXE, RCE, SQLi, etc.). To be fair there were definitely a lot of XSS, but the main reason for that is that it's a really common vulnerability.

ikmckenz · 2025-06-25T03:07:20 1750820840

All of them are real? You have a 100% rate of reports closed as valid?

moyix · 2025-06-24T19:22:31 1750792951

All of these reports came with executable proof of the vulnerabilities – otherwise, as you say, you get flooded with hallucinated junk like the poor curl dev. This is one of the things that makes offensive security an actually good use case for AI – exploits serve as hard evidence that the LLM can't fake.

eeeeeeehio · 2025-06-25T12:06:28 1750853188

Is "proof of vulnerability" a marketing term, or do you actually claim that XBOW has a 0% false positive rate? (i.e. "all" reports come with a PoV, and this PoV "proves" there is a vulnerability?)

moyix · 2025-06-24T19:09:24 1750792164

This is discussed in the post – many came down to individual programs' policies e.g. not accepting the vulnerability if it was in a 3rd party product they used (but still hosted by them), duplicates (another researcher reported the same vuln at the same time; not really any way to avoid this), or not accepting some classes of vuln like cache poisoning.

moyix · 2025-06-24T19:04:07 1750791847

We've got a bunch of agent traces on the front page of the web site right now. We also have done writeups on individual vulnerabilities found by the system, mostly in open source right now (we did some fun scans of OSS projects found on Docker Hub). We have a bunch more coming up about the vulns found in bug bounty targets. The latter are bottlenecked by getting approval from the companies affected, unfortunately.

Some of my favorites from what we've released so far:

- Exploitation of an n-day RCE in Jenkins, where the agent managed to figure out the challenge environment was broken and used the RCE exploit to debug the server environment and work around the problem to solve the challenge: https://xbow.com/#debugging--testing--and-refining-a-jenkins...

- Authentication bypass in Scoold that allowed reading the server config (including API keys) and arbitrary file read: https://xbow.com/blog/xbow-scoold-vuln/

- The first post about our HackerOne findings, an XSS in Palo Alto Networks GlobalProtect VPN portal used by a bunch of companies: https://xbow.com/blog/xbow-globalprotect-xss/