This looks impressive but I’m concerned, is it fair to “teach to the test” by fi... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		bionhoward on March 7, 2025 \| parent \| context \| favorite \| on: Using GRPO to Beat o1, o3-mini and R1 at “Temporal... This looks impressive but I’m concerned, is it fair to “teach to the test” by fine tuning the Qwen model with RL on the test task, while the other models in the comparison are not fine tuned on the test task?

bradhilton on March 7, 2025 [–]

Yeah, the takeaway shouldn't be "our model is smarter," but that we were able to train weak models to as good or better than the best for this specific task. Depends on what you're doing, but sometimes that is enough.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact