More

netdur · 2026-06-06T13:06:11 1780751171

Yes, it is issue of scale, google had to restrict usage because hardware are not available, regardless of what kind of hardware that is

netdur · 2026-06-06T10:06:07 1780740367

That is one of best answers

netdur · 2026-06-05T17:20:56 1780680056

not sure if I understand you, but 4Q and QAT 4Q are different

refulgentis · 2026-06-05T17:27:07 1780680427

It's super annoying when you have products that utilize these because there's...4? releases in 3 weeks?

- Gemma 4 2B/4B/27BE3B/31B

- Gemma 4 2B/4B/27BE3B/31B x "assistant" / MTP drafter models (i.e. multitoken prediction)

- Gemma 4 12B (2 days ago? 1?)

- Gemma 4 QAT 2B/4B/12B/27BE3B/31B x "assistant" models (i.e. multitoken prediction)

It probably sounds silly and really whiny in the abstract. It just causes a ton of work / confusion downstream that feels unnecessary.

Extremely glad for the output, not glad to have to chase it.

ex. llama.cpp currently supports the originals but not the MTP predictors but there is a patch for the MTP predictors but not for the small MoE models and I think it supports the 12B but maybe not media for it yet and now we have these too and the blog says there's GGUFs (llama.cpp models) but there isn't in any of the 12? repos I clicked through. and ~every consumer-facing local LLM app is built on llama.cpp or a fork of it.

Also if anyone at Google is taking feedback over to b/ or product, pleaseeee stop the "E"2B "E"4B thing, unless it's actually taking up less RAM on Android during CPU inference. I can't tell if I need to treat the 4B like an 8B (i.e. beyond most consumer hardware without a GPU) or a 4B (i.e. will run on most consumer hardware since 2021)

EDIT: And, yes, the QAT 12B x mmproj does not work with llama.cpp. I'm glad there's people who have the luxury of not having to, well, actually use these and treat me as whining :) I'll need to schedule another 4-8 hours of work for the 4th time, no fun!

ddarolfi · 2026-06-05T17:35:59 1780680959

These models aren't products? They are open source ish (open weight I guess), research outputs. While the naming scheme may be confusing, it is relevant and important. I believe it's on you to understand it.

sumedh · 2026-06-06T00:20:16 1780705216

> I believe it's on you to understand it.

This is exactly why Google has 10 messenger Apps.

nolist_policy · 2026-06-06T07:58:09 1780732689

Google released their latest messenger app 9 years ago. https://en.wikipedia.org/wiki/Google_Chat

refulgentis · 2026-06-05T18:10:57 1780683057

I understand it. :)

And you're absolutely right to point out they aren't products - I hoped that was clear - when you're building a product with them, you end up having to do the same build loop 4 times, in this instance :)

overfeed · 2026-06-05T19:34:35 1780688075

You can stop after the first one. Choosing to repeat the process is on you, and probably because you see some benefit in using the variant(s) you build on top of.

ddarolfi · 2026-06-05T20:01:18 1780689678

Yes my framing was a little confusing. You were clear in that you are building products on them. I was more saying that because these gemma models are not products, and instead research outputs, the naming scheme should be more scientific rather than consumer friendly.

satvikpendem · 2026-06-05T17:37:28 1780681048

Just use Unsloth Studio it supports them all.

netdur · 2026-06-05T17:16:32 1780679792

had a good run with Gemma 4 E2B Unsloth 4Q: https://youtube.com/shorts/XLsAnz5aAAI

The E4B model doesn’t fit on my phone TPU, so it swaps to RAM, the QAT version means more accuracy, good!

ComputerGuru · 2026-06-05T23:30:27 1780702227

How were you getting anything useful out of that? We found the (unquantized!) E2B model to be completely useless at even the simplest real-world classification tasks.

prism56 · 2026-06-05T20:34:46 1780691686

How do you know it swaps to ram vs on the TPU?

Would be interested in testing this on my pixel.

netdur · 2026-06-06T13:08:10 1780751290

Because TPU has 2GB and weight + context needs more

netdur · 2026-06-03T13:42:27 1780494147

That's great, spotting Opus text got easier

netdur · 2026-05-21T12:16:58 1779365818

I find helpful ads on Google Search sometimes, and it can be the easiest way to get results, but most of the time, ads (and SEO) ruin search accuracy to the point that it's becoming totally useless

netdur · 2026-05-10T20:33:53 1778445233

I am working https://vibu.app which is free digital voucher

and for fun, I am building yet another programming language!

netdur · 2026-05-10T10:01:24 1778407284

Mistakenly, i thought it was about Rotten Tomatoes, and i started thinking about how a movie like Michael ranked badly, the critics missed the whole point of watching a movie, to be entertained, sadly, here on HN, sometimes we miss the point too, if that involves some names

netdur · 2026-05-09T13:43:56 1778334236

Then do not use Android, some Chinese phone or iPhone

netdur · 2026-05-05T19:32:01 1778009521

I am getting 21 t/s on Fold 7, 21 x 1.8 = 37.8 t/s compared to M1 Max's 54 t/s, that is impressive