More

mixtureoftakes · 2026-05-14T23:16:48 1778800608

if you're not checking citations in the paper youre publishing AND trusting a non SOTA, hallucination prone ai model to come up with sources for it, its probably for the best of everyone that this paper isn't published.

yes there will be rare exceptions but in general i feel like this is a really good addition.

flexagoon · 2026-05-15T03:02:48 1778814168

> non SOTA, hallucination prone ai model

What SOTA models are not hallucination prone?

mixtureoftakes · 2026-05-14T19:43:34 1778787814

7b mistral is quite outdated. On a 12gb 4070 you can run qwen 3.5 9b q4km or qwen 3.6 35b, the latter will be a lot smarter but also a lot slower due to ram offload.

Try both in lm studio, they really are surprisingly capable

ge96 · 2026-05-14T20:01:51 1778788911

I have 80gb of ram but it's slow capped by i9 CPU or specific asus mobo sucks I think only 2400mhz despite being ddr4

Tried all the stuff bios, volting

macNchz · 2026-05-15T00:55:21 1778806521

Gemma 4 26B-A4B might be interesting to try on your machine. The latest optimizations make MoE models work pretty nicely on setups like that with a decent GPU and lots of slowish RAM. I have a 16gb GPU and 64gb of 3200mhz DDR4 and get 15-20 tokens/sec out of that model with zero finagling or tweaking. I’ve been very impressed by it, even having run just about every other open weight model that would fit on my machine over the last few years.

ge96 · 2026-05-15T01:13:00 1778807580

that seems slow? 15-20, was expecting 50-60 like mistral although I have not measured that yet on my setup

I've been asking other people but what do you use it for?

mixtureoftakes · 2026-05-14T14:15:46 1778768146

anthropic really needs to just make a great, personalized customer support experience where you get a couple dollars worth of opus credits that has some authority and ability to help with your issue.

"it couldnt be that simple because xyz" why not? I'm yet to see any big ai company actually try this

mixtureoftakes · 2026-05-12T17:51:56 1778608316

Great points. Using their printer "rooted" or with custom firmware seems like a decent compromise to me, kind of like what graphene is doing with pixels

commieneko · 2026-05-12T18:15:38 1778609738

If I had an actual need that wasn't being met, I might buy one of their printers just to root and run with custom firmware. I might just do it for the fun of it. Even with tariffs their printers are only running around $220 at Best Buy.

However, even that sounds suspiciously like a project in and of itself. I haven't had time to design and print anything in the last month. So I expect I'll keep rolling along like I am. Things could always change, though.

mixtureoftakes · 2026-05-09T22:11:41 1778364701

wow!

curious about your workflow for running all these accounts. different harnesses in parallel? manually switching in codex? 5.5pro only?

what works for you?

malisper · 2026-05-09T23:12:05 1778368325

I wrote up a bit about my workflow here[0][1]. I'm using conductor.build to manage multiple codex sessions at once. When I hit the rate limit, I'm using codex-auth[2] to switch codex accounts.

[0] https://malisper.me/pgrust-rebuilding-postgres-in-rust-with-... [1] https://malisper.me/pgrust-update-at-67-postgres-compatibili... [2] https://github.com/loongphy/codex-auth

mixtureoftakes · 2026-05-09T07:20:35 1778311235

please, sign up for a paid plan of either chatgpt or claude. gemini is while close, still noticeably behind

you deserve opinions shaped by interactions with the best tools that are out there.

wg0 · 2026-05-09T07:50:04 1778313004

Gemini feels deep and philosophical. Especially for product management. Tell him you're a product manager and we're a team of two.

But regular reminder - All LLMs can be wrong all the time. I only work with LLMs in domains I'm expert in OR I have other sources to verify their output with utmost certainty.

wafflemaker · 2026-05-09T09:14:33 1778318073

Or when you don't care about results being very correct.

When I'm cooking meatballs with sauce and the recipe calls for frying them, I'll have an LLM guestimate how long and which program to use in an air fryer to mimic the frying pan, based on a picture of balls in a Pyrex. So I can just move on with the sauce, instead of spending time browsing websites and stressing about getting it perfect.

I used to hate these non-deterministic instructions, now I treat it as their own game. When I will publish my first recipe, I'll have an LLM randomize the ingredient amounts, round them up to some imprecise units and also randomize the times. Psychologists say we artists need to participate and I WILL participate.

smartmic · 2026-05-09T09:01:14 1778317274

> I only work with LLMs in domains I'm expert in

This. Should become a general rule for any non-trivial use of LLM in a professionel setting.

josu · 2026-05-09T15:23:18 1778340198

LLMs can also be really good in fields where you are not an expert. You just need to be very aware of your limitations, and start parallel conversation so one agent fact checks the other.

peyton · 2026-05-09T07:26:42 1778311602

Seriously, it’s not worth reaching for less intelligence. Use Extended Pro 100% of the time for things you’d spend the amount of time GP spent writing their post.

cubefox · 2026-05-09T07:40:38 1778312438

Gemini is certainly not behind Claude in terms of physics.

ainch · 2026-05-09T12:00:00 1778328000

Agreed, Gemini is clearly a capable model, but the tool use is lagging behind the other two. Ironically it regularly gets things wrong (ie. the current version of some software) because of an unwillingness to use web search.

hodgehog11 · 2026-05-09T08:05:41 1778313941

ChatGPT and Gemini are actually fairly comparable.

Claude has been utterly useless with most math problems in my experience because, much like less capable students, it tends to get overly bogged down in tedious details before it gets to the big picture. That's great for programming, not so much for frontier math. If you're giving it little lemmas, then sure it's great, but otherwise you're just burning tokens.

mixtureoftakes · 2026-04-24T07:33:25 1777016005

more like weekly or almost daily, gpt 5.5 was literally 12 hours ago

mixtureoftakes · 2026-04-11T08:31:15 1775896275

who are you.

mixtureoftakes · 2026-04-10T23:23:26 1775863406

unpopular opinion but i think it's written quite well

ryan_n · 2026-04-10T23:37:05 1775864225

I don't think that's unpopular, it is pretty well written. But the "I believe" section is extraordinarily hard to believe given Altman's history.

> Working towards prosperity for everyone, empowering all people

> We have to get safety right

> AI has to be democratized; power cannot be too concentrated

None of these statements, IMO, reflect his actions over the past 5 years.

> we urgently need a society-wide response to be resilient to new threats. This includes things like new policy to help navigate through a difficult economic transition in order to get to a much better future

I agree with this, but there is a near 0% chance of that happening anytime soon in the US. I think he probably is aware of this.

Just my opinion, but it comes off as very insincere.

To be clear, what happened is still awful and there's absolutely no justification for it.

kcatskcolbdi · 2026-04-10T23:27:13 1775863633

Yes, clearly not written with his own product.

pesus · 2026-04-10T23:43:08 1775864588

If that's the case, why doesn't he trust his own product enough to write this?

alpaca128 · 2026-04-11T00:53:01 1775868781

He doesn't trust it for anything else either as far as I can tell. In an interview he's boasted about how he uses a paper notebook for everything all day.

daseiner1 · 2026-04-11T02:22:17 1775874137

it's "written well" but not at all a smart piece of writing. leading with a photo of a cute baby before engaging in an extended defense of one's own integrity is so obvious as to be insulting

kspacewalk2 · 2026-04-10T23:25:05 1775863505

Perhaps by ChatGPT

0x3f · 2026-04-10T23:27:58 1775863678

It seems a bit stilted to be LLM'd.

mixtureoftakes · 2026-04-09T22:08:37 1775772517

how can I do this with it?

dcre · 2026-04-10T02:24:39 1775787879

It says it in the post: there’s an action you can map to keys for “move right a space (no animation)”.