Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Usually they're hemorrhaging performance while training.

From that it's pretty likely they were training mythos for the last few weeks, and then distilling it to opus 4.7

Pure speculation of course, but would also explain the sudden performance gains for mythos - and why they're not releasing it to the general public (because it's the undistilled version which is too expensive to run)



Mythos is speculated to have 10 trillion parameters. Almost certainly they were training it for months.


Naturally, it is however noticeable that in the lead up to a model release we always get massively degraded performance for the preceeding few weeks

It's been like that for each model release within the last year


but how true is this? this is almost impossible to measure and those that do[1] find no significant difference

i personally haven't noticed any downgrade at all.

it's entirely possible there's a mass delusion going on where everyone gets wowed by 4.6 initially, then accepts the new baseline and gets used to it, then thinks that baseline is no longer impressive and thus degraded

it doesn't help that anthropic changed defaults for its claude code harness for all users suddenly

the best and only evidence i've seen for actual degradation is that the web version of opus 4.6 failed the car wash test, and since you cannot simply choose to "disable adaptive thinking" and other parameters with the web version, you truly may have gotten a worse product

[1] https://marginlab.ai/trackers/claude-code-historical-perform...




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: