More

StrauXX · 2026-05-27T17:59:42 1779904782

Algorithms are also improving. I believe it's very unlikely for these two improvements together to not result in one to two orders of magnitude cheaper cost per "intelligence". Of course, that might just make use cases that are too expensive today viable and thereby increase usage further.

StrauXX · 2026-05-16T10:21:01 1778926861

LLMs don't tend to help much when solving challenges beyond their skill level. Either they one-shot a challenge, or thei are almost useless as a companion for them.

StrauXX · 2026-05-16T10:17:33 1778926653

It is a hard requirement. Once you reach higher levels of challenges you spend most of your time reading through RFCs, web sepcs, Github issues, mailing lists, papers, random bugtrackers and library/framework code. There is no way to create a whitelist for that. Besides, a firewall won't stop good hackers.

Retr0id · 2026-05-16T10:27:06 1778927226

Normal CTF workflows can involve a lot of research but that's not the point. You can design self-contained challenges with offline solving in mind, and bundle any truly necessary docs/src/etc. with the challenge download.

StrauXX · 2026-05-09T09:21:23 1778318483

Which indications are that?

nicoburns · 2026-05-09T12:14:14 1778328854

The cost factors on the new models compared to the old models.

jeremyjh · 2026-05-09T13:33:03 1778333583

Qwen3.6 9B is as good as GPT-4o and runs on my M2 MacBook Air. Models are getting stronger and less costly at the same time, but these are somewhat separate branches of research. Frontier labs are spending more because they are still getting marginal returns and there is more capacity to spend than there was a year ago.

gertop · 2026-05-09T14:31:30 1778337090

Qwen 3.6 9B doesn't exist.

If you meant 3.5 9B and you truly believe it's as good as 4o then I can only assume you have a very basic use case.

jeremyjh · 2026-05-09T20:54:57 1778360097

You are right, I was mistaken about the version. I evaluated it in general chat assistant prompts plucked from my history across a range of topics but did not use it for coding - there was never a time when I thought 4o was “good enough” for agentic coding.

bdelmas · 2026-05-09T13:11:23 1778332283

You are mixing cost and progress. It’s not because it’s more and more expensive that progress is slowing down by itself.

nicoburns · 2026-05-09T13:34:46 1778333686

They are intrinsically linked beyond a certain point. If we're making progress but costs are spiraling exponentially then it stands to reason that we will soon reach a point where we can no longer afford the increasing costs and thus progress will slow.

(barring some breakthrough that reduces costs, which of course may happen, but for which recent model improvements are not strong evidence of)

aspenmartin · 2026-05-09T14:30:44 1778337044

Cost for a specific level of performance decreases 10x per year, this has been a pretty consistent property for awhile now.

butlike · 2026-05-12T21:25:03 1778621103

I guess within the domain of AI, a pertinent question would be: "do I want to use anything but the best?" The errors older models give being directly analogous to being stupider in my eyes.

aspenmartin · 2026-05-13T02:37:29 1778639849

Depends — many tasks in various pipelines have a reasonable Pareto frontier and diminishing returns after a certain level of performance. You may just have a high budget constraint (say like YouTube computing ASR subtitles; they are not going to be using the best ASR models because it’s expensive). If it’s myself, with a coding agent, I’m going to get the best thing I can afford.

overfeed · 2026-05-09T09:25:10 1778318710

Investment dollars.

dzhiurgis · 2026-05-09T10:42:10 1778323330

Source for that claim?

lionkor · 2026-05-09T10:25:41 1778322341

Nobody is releasing NEW models

aspenmartin · 2026-05-09T11:54:50 1778327690

…not only is this not true but it also doesn’t matter. Why would this indicate performance saturating?

taneq · 2026-05-09T10:59:05 1778324345

The standard networking connection has been called “Ethernet” for more than thirty years, so networking has stagnated, right?

SlinkyOnStairs · 2026-05-09T11:31:57 1778326317

If higher bandwidth networking consisted primarily running more and more ethernet lines in parallel, you would most certainly agree that "networking has stagnated".

"Reasoning" and now "Agentic" AI systems are not some fundamental improvement on LLMs, they're just running roughly the same prior-gen LLMS, multiple times.

Hence the conclusion that LLM improvement has slowed down, if not stagnated entirely, and that we should not expect the improvements of switching to these "reasoning" systems to keep happening.

p1esk · 2026-05-09T12:24:28 1778329468

From TFA:

“ChatGPT came up with an idea which is original and clever. It is the sort of idea I would be very proud to come up with after a week or two of pondering, and it took ChatGPT less than an hour to find and prove”

SlinkyOnStairs · 2026-05-09T12:33:16 1778329996

You misunderstand. I'm not saying that Reasoning/Agentic systems aren't better.

I'm saying they're not an advancement in the tech in the way GPT 1 through 3 were. They're a different kind of improvement.

And as such the rate improvement cannot just be extrapolated into the future.

p1esk · 2026-05-09T12:55:05 1778331305

GPT1 through GPT3 advancement were exactly like using more Ethernet cables in parallel.

All interesting conceptual breakthroughs came after GPT3: RL and reasoning being the main ones.

kstenerud · 2026-05-09T11:49:07 1778327347

What constitutes a NEW model for the purposes of calculating progress?

GardenLetter27 · 2026-05-09T12:07:18 1778328438

What? DeepSeekV3 just came out and is incredible for the price. Mythos is also half-released.

nozzlegear · 2026-05-09T14:24:02 1778336642

Until you or I can actually use Mythos in Claude without an nda or other strings attached, Mythos is not released and is just an effective marketing tool for Anthropic.

pixl97 · 2026-05-10T16:50:05 1778431805

At least to me this is a pretty sour grapes take. There are all kinds of released products that are expensive or need an NDA. You're just too poor to afford it. But make no mistakes there are governments using this in mass and likely against you.

nxobject · 2026-05-10T22:50:18 1778453418

I think that’s worthy of at least sour grapes, too.

StrauXX · 2026-04-26T21:11:56 1777237916

Self hosting at a reasonable scale is much cheaper than people think. I am running clusters of DGX Spark machines with BiFrost load balancers in our company and for client projects. They work flawlessly!

128 GB unified memory, Nvidia chip and ARM CPU for just around 3k€ net. They easily push ~400 input and ~100 output tokens per second per device on say gpt-oss-120b. With two devices in a cluster, thats enough performance for >20 concurrent RAG users or >3 "AI augmented" developers.

And they don't even pull that much power.

byzantinegene · 2026-04-27T09:24:11 1777281851

factor in depreciation and energy costs, and a subscription might just be cheaper.

StrauXX · 2026-04-27T13:04:51 1777295091

It is definetly cheaper now. What I want to say with this, is that token costs rising so dramatically that AI usage becomes uneconomical is not a high probability future. Even if AI subscriptions were sold heavily below cost (which is also unlikely, after R&D).

StrauXX · 2026-04-22T17:18:26 1776878306

The party is called the "Christian Democratic Party" but in practice pushes no christian policies. 47% of germans are legally atheists. Only 5% regularly visit mass.

SpicyLemonZest · 2026-04-22T17:28:16 1776878896

All very true, but I don't think it contradicts my point. Perhaps the lower prevalence of religious participation in Germany makes German secularists more comfortable with religious symbols and practices.

StrauXX · 2026-04-20T13:43:40 1776692620

He has changed his opinion completely. Yes, the ratio has turned.

StrauXX · 2026-04-20T10:30:06 1776681006

Only Stripe offers a service doing international VAT for you.

deaux · 2026-04-20T11:11:52 1776683512

Isn't that largely the selling point of every MoR? Paddle, Polar.sh and so on.

StrauXX · 2026-04-16T08:12:50 1776327170

vLLM isn't suitable for people running LLMs side-by-side with regular applications on their PC. It is very good at hosting LLMs for production on dedicated servers. For the prod usecase ollama/llamacpp are practically useless (but that's ok - it's not the projects goal to be).

StrauXX · 2026-04-15T15:13:37 1776266017

Northdata [1] does basically this, but mostly ingests European data currently. Perhaps they will expand to ingest US data as well at some point. Not affiliated with them in any way. I just use them to look into company structures every now and again.

[1] https://www.northdata.com/