Hacker Newsnew | past | comments | ask | show | jobs | submit | GaggiX's commentslogin

$0.435/$0.87 for the standard speed, this one should be 3 times that.

Not to be confused with Flash Attention.

What's novel here is the extremely small KV cache memory usage per long context windows, like 0.77GB with 512K, a 90% memory usage reduction compare to the already really small KV cache memory usage of Deepseek V4 Flash.


>I’ll take a few f bombs and the truth.

Don't want to ruin it but go read some old posts from the author about AI, the tone is the same and he is very much wrong.


If MiMo v2.5 Pro can run at >1000tk/s on GPUs then I will soon expect the same from OpenAI/Anthropic/Google.

I wouldn't expect any of the american labs to be particularly great (or have much desire) to work on efficiency, they've been consistently proven to be uninterested (if not incapable) of actually improving on those types of things. The closest we've seen lately is that maybe GPT-5.5 (and Opus 4.{7,8}?) are more token-efficient, i.e. they solve things with less tokens...? It hasn't been coupled with any other kind of efficiency bump, though, and we're seeing higher costs anyway in most places where the american labs are involved.

The only players that seem to be capable of a consistent pattern of doing more with less currency are the chinese labs.


> That's technically encoding

Isn't that just projecting the patches into the d_model size vectors that the models takes?

>I am assuming that involves of quantization

12B model in 16GB seems very reasonable to me, int8 is top quality for running models.


I don’t think so, the HF weights are bf16 which means 24GB + cache/overhead.

It sounds like marketing spin where the performance claims are based on BF16 and the “runs in 16GB” claim is on a totally different quantized version.


The guide describes it as projection although there is apparently an extra step: "A factorized coordinate lookup (X and Y matrices) attaches spatial location information directly to the input."

12B at int8 would take up 12G memory, or 75% of the system memory which technically fits within 16GB but the OS will not like that. EDIT: On my 18G memory MacBook Pro, LM Studio reports a "partial GPU offload" for the int8 MLX weights. Can't test because the `gemma_unified" architecture is NYI.


Yeah and it’s pretty memory efficient with only 8 attention layers so at int8 in 16GB ram maybe you still get 64k-128k context.

The part I hate though is that I’d bet none of the performance claims are based on int8.

Why do we care about bf16 benchmarks when no one will be using that with this model.


>Even though it feels vibed

Where does it feel vibed?


I would love to see comparisons with AV1 on very low bitrates.

Return of the 8MB Shrek encodes?


There's a 64MB game boy advance cartridge with shrek on it [1]. Looks pretty horrible [2]. But the GBA only has 16KB fast / 256KB slow RAM, and a 16MHz CPU.

[1] https://archive.org/details/Shrek-Video-GBA [2] https://www.youtube.com/watch?v=CyOfPZQl4MI


Video resolution: 128x72, hahah. Late 90s RealPlayer postage stamp video is back! To its credit, that whole movie is probably smaller than RealPlayer itself was.

I love this, hope to see a AV2 version at 8MB

I once watched an entire movie (around 90 mins) on a Nokia 6303. This reminds me of that

6MB should be enough for everyone!

Almost half of the file is audio, so you're not saving as much.

Hopefully audio codecs will progress as well.

I noticed that too. When I tried extreme screen recording compression with AV1 audio became a noticeable part of the bottleneck.


Is opus being used for the audio or it's not the solution for extreme lossy compression?

OPUS is the default now, but there's more room. Especially with neural codecs.

The giant Umarell in the background is a nice piece of furniture.

Edit: I noticed later it was in Milan, I guess it makes perfect sense.


Given your definition what's the difference between AGI and superintelligence?

AGI should at least match, not surpass humans in every cognitive task.


AGI is just "artificial" (a program) version of general intelligence (the general purpose intelligence humans have).

Nothing in AGI implies "surpass humans in every cognitive task".

Not even "match in every cognitive task" is really required. There are humans that by definition have "general intelligence" that still don't match other humans "in every cognitive task", just in some.

Why should AGI need to match ALL humans in EVERY congitive task then? An AGI just needs to be as good as an average (or even slightly below average) human, in human-like cognition.


I guess AGI is the breaking point and superintelligence is everything above?

I wonder does it mean that ublock origin has anti-anti-adblock functionality? (My guess is yes but I wanted to take the opportunity to spell that word)


It does, yes.


It's blocking all the way down.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: