LLM inference optimization has been key for the OpenAI GPT-4o presentation (2x faster, 50% cheaper) and its driving lots of industry research because it’s direct cost savings, but it’s refreshing to see so many techniques published as papers (i.e from Stanford, Berkeley…)