I think this is the case. In the early GPT-4 days I tested the same model side by side across the subscription and API. The API always produced a longer better answer. To me it felt like the API model was working how it was supposed to work while the subscription model tried to reduce its token usage. From a business perspective that would make sense. I then switched to API only because I felt like it was worth the extra cost.
I did a similar test with sonnet about 6 months ago and noticed no difference, except that the subscription was way cheaper than API access. This is not the case anymore, at least not for me. The subscription these days only lasts for a few requests before it hits the usage limit and goes over to ”extra usage” billing. Last week I burned through my entire subscription budget and 80$ worth of extra usage in about 1h. That is not sustainable for me and the reason I started looking at alternatives.
From a business perspective it all makes sense. Anthropic recently gave away a ton of extra usage for free. Now people have balance on their accounts that Anthropic needs to pay for with compute, suddenly they release a model that seem to burn those tokens faster than ever. Last week I felt like the model did the opposite, it was stopping mid implementation and forgetting things after only 2 turns. Based on the responses I got it seemed like they were running out of compute, lobotomized their model and made it think less, give shorter answers etc. Probably they are also doing A/B testing on every change so my experience might be wildly different from someone else.
The UIs all bake in system prompts and other tunable configs that the API leaves open, so does Claude Code and other harnesses. So anything you notice different over the API when you're controlling the client is almost certainly that. Note that this is kind of something they have to do because consumer UI users will do stuff like ask models their name or date, or want it to respond politely and compassionately, and get upset/confused when they just get what's in the weights.
The problem with subscriptions for this kind of stuff is that it's just incompatible with their cost structure. The worst being, subscription usage is going to follow a diurnal usage pattern that overlaps with business/API users, so they're going to have to be offloaded to compute partners who most likely charge by the resource-second. And also, it's a competitive market, anybody who wants usage-based pricing can just get that.
So you basically end up with adverse selection with consumer subscription models. It's just kind of an incoherent business model that only works when your value proposition is more than just compute (which has a usage-based, pretty fungible market)
> In the early GPT-4 days I tested the same model side by side across the subscription and API. The API always produced a longer better answer.
If you are comparing responses in ChatGPT to the API, it's apples and oranges, since one applies a very opinionated system prompt and the other does not.
Since you haven't figured that out in 3 years, I didn't bother reading the rest of your comment.
I don’t know about ChatGPT, but in Claude Code I _have_ been able to do a side-by-side comparison of API-based metered billing vs subscription billing, in the same UI. You just switch from one to the other using /login.
You should probably not be so quick to dismiss what people say as nonsense.
I did a similar test with sonnet about 6 months ago and noticed no difference, except that the subscription was way cheaper than API access. This is not the case anymore, at least not for me. The subscription these days only lasts for a few requests before it hits the usage limit and goes over to ”extra usage” billing. Last week I burned through my entire subscription budget and 80$ worth of extra usage in about 1h. That is not sustainable for me and the reason I started looking at alternatives.
From a business perspective it all makes sense. Anthropic recently gave away a ton of extra usage for free. Now people have balance on their accounts that Anthropic needs to pay for with compute, suddenly they release a model that seem to burn those tokens faster than ever. Last week I felt like the model did the opposite, it was stopping mid implementation and forgetting things after only 2 turns. Based on the responses I got it seemed like they were running out of compute, lobotomized their model and made it think less, give shorter answers etc. Probably they are also doing A/B testing on every change so my experience might be wildly different from someone else.