Hacker Newsnew | past | comments | ask | show | jobs | submit | jessepcc's commentslogin

At this point 'frontier model release' is a monthly cadence, Kimi 2.6 Claude 4.6 GPT 5.5, the interesting question is which evals will still be meaningful in 6 months.

more like weekly or almost daily, gpt 5.5 was literally 12 hours ago

The n=19 and self-selection bias here are load-bearing problems that the paper undersells. The message volumes (20k+ per user on average) already suggest they are already in deep trouble. Participants were recruited because they reported harms. Interesting read.


With the coding slot machine, I prefer move fast and start over if anything goes off track. Maybe the amount of token spent with several iterations is similar to using a more well planned system like GSD.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: