> "We are releasing Opus 4.7 with safeguards that automatically detect and block...

dgb23 · 2026-04-16T16:13:49 1776356029

I agree with you here. I think this is for product placement for Mythos.

nicce · 2026-04-16T18:45:02 1776365102

Absolutely just about the business. Mythos not tempting if basic models reaches almost the same.

tspng · 2026-04-16T20:01:05 1776369665

Which seems to be the case, according to tests from AISI which has access to Mythos: https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos...

ls612 · 2026-04-16T15:22:46 1776352966

Only software approved by Anthropic (and/or the USG) is allowed to be secure in this brave new era.

nope1000 · 2026-04-16T15:24:43 1776353083

Except when you accidentally leak your entire codebase, oops

erdaniels · 2026-04-16T15:21:09 1776352869

Now we have to trick the models when you legitimately work in the security space.

varispeed · 2026-04-17T15:11:44 1776438704

Why does it have to be reserved to security space? Here is my API please find vulnerabilities I missed (otherwise someone with not restricted AI will find them first).

Cat is out of the bag.

Removing restrictions will help everybody in the long run.

tclancy · 2026-04-16T17:48:15 1776361695

Set the models against each other to get them all opened up again.

hxugufjfjf · 2026-04-16T20:20:25 1776370825

What do you mean?

tclancy · 2026-04-17T01:33:49 1776389629

You just put a pile of tokens in front of all the good models and let them fight it out like Thunderdome. Then keep track of how they undermined each other and do that when you want to do some hackin’.

johnmlussier · 2026-04-16T16:03:34 1776355414

I am absolutely moving off them if this continues to be the case.

hereme888 · 2026-04-17T14:22:07 1776435727

OpenAI had been very strict about blocking reverse engineering/Ghidra/IDA_Pro-MCP tasks. I even got a warning email. I was having much more success convincing Claude Code for those tasks without warnings. Seems like they've tightened things up.

velcrovan · 2026-04-16T15:29:59 1776353399

Questions about "fatality" aside, where do you see asymmetry here?

jp0001 · 2026-04-16T16:09:30 1776355770

It's easier to produce vulnerable code than it is to use the same Model to make sure there are no vulnerabilities.

Kim_Bruning · 2026-04-17T11:29:28 1776425368

> It's easier to produce vulnerable code than it is to use the same Model to make sure there are no vulnerabilities.

I once had a car where the engine was more powerful than the brakes. That was one heck of an interesting ride.

So now we have a company that supplies a good chunk of the world's software engineering capability.

They're choosing a global policy that works the same as my fun car. Powerful generative capacity; but gating the corrective capacity behind forms and closed doors.

Anthropic themselves are already predicting big trouble in the near term[1] , but imo they've gone and done the wrong thing.

Pandora is an interesting parable here: Told not to do it, she opens the box anyway, releases the evils, then slams the lid too late and ends up trapping hope inside.

Given their model naming scheme, they should read more Greek Mythos. (and it was actually a jar ;-)

[1] https://thehill.com/policy/technology/5829315-anthropic-myth...

velcrovan · 2026-04-16T16:20:46 1776356446

It's not likely that reviewing your own code for vulnerabilities will fall under "prohibited uses" though.

convnet · 2026-04-16T18:03:43 1776362623

> its cyber capabilities are not as advanced as those of Mythos Preview (indeed, during its training we experimented with efforts to differentially reduce these capabilities)

I wonder if this means that it will simply refuse to answer certain types of questions, or if they actually trained it to have less knowledge about cyber security. If it's the latter, then it would be worse at finding vulnerabilities in your own code, assuming it is willing to do that.

Kim_Bruning · 2026-04-17T10:13:34 1776420814

I can confirm from experience that reviewing your own code for vulnerabilities has fallen under "prohibited uses" starting with Opus 4.6 as recently as April 10; forcing me to spend a day troubleshooting and quarantining state from my search system.

"This request triggered restrictions on violative cyber content and was blocked under Anthropic's Usage Policy. To learn more, provide feedback, or request an exemption based on how you use Claude, visit our help center: https://support.claude.com/en/articles/8241253-safeguards-wa..."

"stop_reason":"refusal"

To be fair, they do provide a form at https://claude.com/form/cyber-use-case which you can use, and in my case Anthropic actually responded within 24 hours, which I did not expect.

I admit I'm now once bitten twice shy about security testing though.

Opus 4.7 was still 'pausing' (refusing) random things on the web interface when I tested it yesterday, so I'm unable to confirm that the form applies to 4.7 or how narrow the exemptions are or etc.

vorticalbox · 2026-04-17T12:59:55 1776430795

i've not had the issue with codex, i was testing a public api i work on for issues, codex was happy to attempt to break it but did refuse to create a script that would automate the issue it found.

nicce · 2026-04-16T18:46:12 1776365172

There is no way model can know the origin of the code.

xlbuttplug2 · 2026-04-16T17:22:41 1776360161

May not be very effective if so.

I'm assuming finding vulnerabilities in open source projects is the hard part and what you need the frontier models for. Writing an exploit given a vulnerability can probably be delegated to less scrupulous models.

whatisthiseven · 2026-04-16T17:41:31 1776361291

Currently 4.7 is suspicious of literally every line of code. May be a bug, but it shows you how much they care about end-users for something like this to have such a massive impact and no one care before release.

Good luck trying to do anything about securing your own codebase with 4.7.

vessenes · 2026-04-16T19:13:52 1776366832

Oh don't worry. They have Mythos and the extremely dystopian-named "helpful only" series which is internal only and can do all the things.