Hacker Newsnew | past | comments | ask | show | jobs | submit | mixtureoftakes's commentslogin

if you're not checking citations in the paper youre publishing AND trusting a non SOTA, hallucination prone ai model to come up with sources for it, its probably for the best of everyone that this paper isn't published.

yes there will be rare exceptions but in general i feel like this is a really good addition.


> non SOTA, hallucination prone ai model

What SOTA models are not hallucination prone?


7b mistral is quite outdated. On a 12gb 4070 you can run qwen 3.5 9b q4km or qwen 3.6 35b, the latter will be a lot smarter but also a lot slower due to ram offload.

Try both in lm studio, they really are surprisingly capable


I have 80gb of ram but it's slow capped by i9 CPU or specific asus mobo sucks I think only 2400mhz despite being ddr4

Tried all the stuff bios, volting


Gemma 4 26B-A4B might be interesting to try on your machine. The latest optimizations make MoE models work pretty nicely on setups like that with a decent GPU and lots of slowish RAM. I have a 16gb GPU and 64gb of 3200mhz DDR4 and get 15-20 tokens/sec out of that model with zero finagling or tweaking. I’ve been very impressed by it, even having run just about every other open weight model that would fit on my machine over the last few years.

that seems slow? 15-20, was expecting 50-60 like mistral although I have not measured that yet on my setup

I've been asking other people but what do you use it for?


anthropic really needs to just make a great, personalized customer support experience where you get a couple dollars worth of opus credits that has some authority and ability to help with your issue.

"it couldnt be that simple because xyz" why not? I'm yet to see any big ai company actually try this


Great points. Using their printer "rooted" or with custom firmware seems like a decent compromise to me, kind of like what graphene is doing with pixels

If I had an actual need that wasn't being met, I might buy one of their printers just to root and run with custom firmware. I might just do it for the fun of it. Even with tariffs their printers are only running around $220 at Best Buy.

However, even that sounds suspiciously like a project in and of itself. I haven't had time to design and print anything in the last month. So I expect I'll keep rolling along like I am. Things could always change, though.


wow!

curious about your workflow for running all these accounts. different harnesses in parallel? manually switching in codex? 5.5pro only?

what works for you?


I wrote up a bit about my workflow here[0][1]. I'm using conductor.build to manage multiple codex sessions at once. When I hit the rate limit, I'm using codex-auth[2] to switch codex accounts.

[0] https://malisper.me/pgrust-rebuilding-postgres-in-rust-with-... [1] https://malisper.me/pgrust-update-at-67-postgres-compatibili... [2] https://github.com/loongphy/codex-auth


please, sign up for a paid plan of either chatgpt or claude. gemini is while close, still noticeably behind

you deserve opinions shaped by interactions with the best tools that are out there.


Gemini feels deep and philosophical. Especially for product management. Tell him you're a product manager and we're a team of two.

But regular reminder - All LLMs can be wrong all the time. I only work with LLMs in domains I'm expert in OR I have other sources to verify their output with utmost certainty.


Or when you don't care about results being very correct.

When I'm cooking meatballs with sauce and the recipe calls for frying them, I'll have an LLM guestimate how long and which program to use in an air fryer to mimic the frying pan, based on a picture of balls in a Pyrex. So I can just move on with the sauce, instead of spending time browsing websites and stressing about getting it perfect.

I used to hate these non-deterministic instructions, now I treat it as their own game. When I will publish my first recipe, I'll have an LLM randomize the ingredient amounts, round them up to some imprecise units and also randomize the times. Psychologists say we artists need to participate and I WILL participate.


> I only work with LLMs in domains I'm expert in

This. Should become a general rule for any non-trivial use of LLM in a professionel setting.


LLMs can also be really good in fields where you are not an expert. You just need to be very aware of your limitations, and start parallel conversation so one agent fact checks the other.

Seriously, it’s not worth reaching for less intelligence. Use Extended Pro 100% of the time for things you’d spend the amount of time GP spent writing their post.

Gemini is certainly not behind Claude in terms of physics.

Agreed, Gemini is clearly a capable model, but the tool use is lagging behind the other two. Ironically it regularly gets things wrong (ie. the current version of some software) because of an unwillingness to use web search.

ChatGPT and Gemini are actually fairly comparable.

Claude has been utterly useless with most math problems in my experience because, much like less capable students, it tends to get overly bogged down in tedious details before it gets to the big picture. That's great for programming, not so much for frontier math. If you're giving it little lemmas, then sure it's great, but otherwise you're just burning tokens.


more like weekly or almost daily, gpt 5.5 was literally 12 hours ago


who are you.


unpopular opinion but i think it's written quite well


I don't think that's unpopular, it is pretty well written. But the "I believe" section is extraordinarily hard to believe given Altman's history.

> Working towards prosperity for everyone, empowering all people

> We have to get safety right

> AI has to be democratized; power cannot be too concentrated

None of these statements, IMO, reflect his actions over the past 5 years.

> we urgently need a society-wide response to be resilient to new threats. This includes things like new policy to help navigate through a difficult economic transition in order to get to a much better future

I agree with this, but there is a near 0% chance of that happening anytime soon in the US. I think he probably is aware of this.

Just my opinion, but it comes off as very insincere.

To be clear, what happened is still awful and there's absolutely no justification for it.


Yes, clearly not written with his own product.


If that's the case, why doesn't he trust his own product enough to write this?


He doesn't trust it for anything else either as far as I can tell. In an interview he's boasted about how he uses a paper notebook for everything all day.


it's "written well" but not at all a smart piece of writing. leading with a photo of a cute baby before engaging in an extended defense of one's own integrity is so obvious as to be insulting


Perhaps by ChatGPT


It seems a bit stilted to be LLM'd.


how can I do this with it?


It says it in the post: there’s an action you can map to keys for “move right a space (no animation)”.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: