Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I feel like we are getting closer and closer to a finding the Goldilocks Llm, that with some smarter training and the right set of parameters, will get us close to got 3.5 turbo performance but at a size, cost, and time effort that is significantly lower and that is runnable locally.

Combine that with what seems like every chip adding a neural engine and it feels like we are in the early days of high performance graphics again. Right now we are in the unreal engine voodoo era, where graphics cards/neural engines are expensive/rare. But give it a few generations and soon we can assume that even standard computers will have pretty decent NPUs and developers will be able to rely on that for models



>3.5 turbo performance

Is this the level of performance people are relying upon? While I've always been impressed with the technology itself, it's only starting with GPT 4 that I think it approaches adequate performance.


For the work that I do (which is mostly rag with a little bit of content generation) GPT 3.5-turbo-0125 with a 16k context window is the sweet spot for me. I started using the api when it was only a 4k context window, so the extra breathing room provided by the 16k context window feels cavernous. Plus the fact that it's $0.50 per 1Million tokens means that I can augment my software with LLM capabilities at a cost that is attractive to me as a small time developer.

The way I rationalize it is that using 3.5-turbo is like programming on an 8-bit computer with Kilobytes of Ram, and gpt-4o is like programming on a 64bit computer with a 4080 ti and 32gb of ram. If I can make things work on the 8-bit system, they will work nicely on the more powerful system.


3.5-turbo performance wasn't very good though, and according to API statistic analysis it's a Nx7B model so it's already rather small. Ultimately Llama-3-8B is already better in all measurable metrics except multilingual translation, but that's not saying much.


yeah, it's called phi-3-medium-4k-instruct

https://huggingface.co/microsoft/Phi-3-medium-4k-instruct


It's not called anything until the lmsys leaderboard ranks it. Microsoft's blatant benchmark overfitting on Phi-2 makes for very little trust in what they say about performance. As a man once said, fool me once, shame on you, fool me twice-can't get fooled again.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: