jessepcc's comments

jessepcc · 2026-04-24T04:01:48 1777003308

At this point 'frontier model release' is a monthly cadence, Kimi 2.6 Claude 4.6 GPT 5.5, the interesting question is which evals will still be meaningful in 6 months.

mixtureoftakes · 2026-04-24T07:33:25 1777016005

more like weekly or almost daily, gpt 5.5 was literally 12 hours ago

jessepcc · 2026-03-18T09:26:36 1773825996

The n=19 and self-selection bias here are load-bearing problems that the paper undersells. The message volumes (20k+ per user on average) already suggest they are already in deep trouble. Participants were recruited because they reported harms. Interesting read.

jessepcc · 2026-03-18T02:35:25 1773801325

With the coding slot machine, I prefer move fast and start over if anything goes off track. Maybe the amount of token spent with several iterations is similar to using a more well planned system like GSD.