True, but this is not only a trade-off between opex and capex. Local inference u...

fireant · 2026-04-09T19:58:02 1775764682

I agree in principle that more democratic compute = better and third parties introduce additional risk that is outside of your control. That said I just don't see it working economically - either you have an underpowered GPU (4-digit range) at which point you have weak model, or slow model, probably both weak and slow. Or you have expensive GPU cluster, but at that point you also need to consider utilization as you are probably not streaming tokens out 24/7 and at that point TCO is just drastically more expensive for self hosting.

Personally I hope we see a third way - strong open weight models hosted by variety of companies actually competing on price and 9s of availability. That way capex expensive GPUs are fully utilized and users can rent intelligence as a commodity.

There is a very apt analogy to virtual server hosting - hosting vps/shared web is a commodity, it does not make financial sense for most users to host their website on their own physical servers in their basements.