Here's an example of language switching: https://gr.inc/question/although-a-few-...

jstanley · on March 6, 2025

Thanks, that's cool to see. I hadn't seen this site before but browsing around I also found this example: https://gr.inc/question/why-does-the-professor-say-this-good... - also with LIMO.

pizza · on March 7, 2025

Seems like if you want to stay in the same language, you could just add a verifiable rewards term for that w/o having to fully load up on the baggage of a base model KL penalty.

kcorbitt · on March 7, 2025

Yep. And tbh you probably don't even have to do this; the R1 paper found that just running SFT the base model with a relatively small number of monolingual reasoning traces was enough for it to get the idea and iirc they didn't even bother selecting for language specifically in the RL training looop itself.