I feel like we are getting closer and closer to a finding the Goldilocks Llm, th...

djeastm · on May 31, 2024

>3.5 turbo performance

Is this the level of performance people are relying upon? While I've always been impressed with the technology itself, it's only starting with GPT 4 that I think it approaches adequate performance.

Decabytes · on May 31, 2024

For the work that I do (which is mostly rag with a little bit of content generation) GPT 3.5-turbo-0125 with a 16k context window is the sweet spot for me. I started using the api when it was only a 4k context window, so the extra breathing room provided by the 16k context window feels cavernous. Plus the fact that it's $0.50 per 1Million tokens means that I can augment my software with LLM capabilities at a cost that is attractive to me as a small time developer.

The way I rationalize it is that using 3.5-turbo is like programming on an 8-bit computer with Kilobytes of Ram, and gpt-4o is like programming on a 64bit computer with a 4080 ti and 32gb of ram. If I can make things work on the 8-bit system, they will work nicely on the more powerful system.

moffkalast · on May 31, 2024

3.5-turbo performance wasn't very good though, and according to API statistic analysis it's a Nx7B model so it's already rather small. Ultimately Llama-3-8B is already better in all measurable metrics except multilingual translation, but that's not saying much.

segmondy · on May 31, 2024

yeah, it's called phi-3-medium-4k-instruct

https://huggingface.co/microsoft/Phi-3-medium-4k-instruct

moffkalast · on May 31, 2024

It's not called anything until the lmsys leaderboard ranks it. Microsoft's blatant benchmark overfitting on Phi-2 makes for very little trust in what they say about performance. As a man once said, fool me once, shame on you, fool me twice-can't get fooled again.