Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LLM inference optimization has been key for the OpenAI GPT-4o presentation (2x faster, 50% cheaper) and its driving lots of industry research because it’s direct cost savings, but it’s refreshing to see so many techniques published as papers (i.e from Stanford, Berkeley…)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: