Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The "win matrix" (dissimilarity Matrix) seems very interesting, looks eg like Vicuna13b paired against gpt4 wins 20% of the time. Larger difference than I'd have guessed based on scores.


Yeah the win matrix is what you want to look at if you haven't internalized or memorized what various Elo differences mean




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: