More

pbmango · 2026-05-28T16:55:37 1779987337

I can't help but think of Iphone updates since about 2018. The thinnest, fastest, longest battery life Iphone ever. It seems mostly the same and I probably won't be able to tell other than the name, but everyone buys it anyway.

This is good psychology for the labs. When Buffett invested in Apple he loved citing how most people would rather give up their second car than their Iphone.

krupan · 2026-05-28T20:38:46 1780000726

This in incredibly refreshing take, thank you. It's about time someone admitted that we aren't on the verge of Singularity with these LLMs. We've probably hit a local AI maxima here and it could be another 10 to 20 years before we am get another big break through.

MangoCoffee · 2026-05-28T17:12:04 1779988324

ChatGPT came out in 2022. Back then it was just a chatbot. Now we have AI agents. What matters is how we use them and how the agents get better. That’s what will move AI forward.

zozbot234 · 2026-05-28T17:25:27 1779989127

An 'AI agent' is just a chatbot that is told to type commands on a REPL-like interface as part of its system prompt. It's still processing pure text-based requests and responses, they're just not restricted to natural language.

arbitrandomuser · 2026-05-28T17:44:39 1779990279

A lot of people dont know this , also the chatbot (chatgpt) itself is a next token predictor (the GPT) that's been given an initial text that says " pretend to be a chatbot .." and asked to complete it , the coherant chatting behaviour is something thats emergent .

later on someone figured if you asked it to output a reasoning before it gave a response its output would have more logical coherence, as though the reasoning output tokens functioned as a scratch space for it to work on.

at the end its all next token prediction

hellohello2 · 2026-05-28T17:48:04 1779990484

No, chatbots are LLMs trained for question-answering through RLHF (its not just a prompt). But yes, if you just zero-shot prompt a bare LLM you can still "talk to it" & you are correct on everything else as far as I know.

fragmede · 2026-05-28T20:22:44 1779999764

At lot of people don't know this, also the human brain is a squishy lump of meat. that's been given a childhood and the prompt "act like an adult", and asked to behave. The coherant chatting behaviour is something thats emergent .

later on someone figured if you shove Adderall in it and it to think before it speaks, it gave a response its output would have more logical coherence, as though the Adderall concentration drugd functioned as a scratch space for it to work on.

in the end its a squishy lump of meat.

NateEag · 2026-05-29T00:17:39 1780013859

We know from living as humans that we have experiences.

We have no such evidence that LLMs do.

That's a pretty significant difference between the next-token predictor and the squishy lump of meat.

Alex_L_Wood · 2026-05-28T22:24:31 1780007071

How much must one tie their self-worth to a chatbot to debase themselves like that? To think that a winner in the arms and intelligence race of animal kingdom, a member of the species that made this chatbot, would put down themselves like that in the defense of the thoughtless silicon is absolutely laughable and depressing at the same time.

fragmede · 2026-05-28T23:18:23 1780010303

I'm merely pointing out the logical fallacy of thinking complex systems can't arise from simpler components in an obtuse fashion. Ants are stupid individually, yet they're able to create giant structures in the wild. Hating on AI and calling it next word prediction isn't going to save anyone's jobs. Organizing will. Voting will.

hellohello2 · 2026-05-28T17:45:57 1779990357

They are chatbots trained for tool use, its not just a prompt.

sigmarule · 2026-05-28T19:43:30 1779997410

An AI agent and a chatbot are both applications built using LLM inference as a primitive.

furyofantares · 2026-05-28T19:07:25 1779995245

Yeah and a car is just an engine connected to wheels.

smj-edison · 2026-05-28T21:42:10 1780004530

Yeah. LLMs are fundamentally a batch-based system, and we smear a veneer of liveness and autonomy on top.

MattDamonSpace · 2026-05-28T17:46:37 1779990397

Not even 4 years old yet. This tech curve has been insane

rzmmm · 2026-05-28T21:34:51 1780004091

I still use LLM in quite similar way as when ChatGPT was launched. There has been progress but I think the real leap was 2020-2022.

SoftTalker · 2026-05-28T18:05:08 1779991508

Not even the typical lifecycle of a corporate PC or laptop. It is pretty wild.

gaflo · 2026-05-29T01:41:25 1780018885

If you upgrade your 8 year old phone the many incremental upgrades will be very noticeable. From my personal experience the LLM space is also moving at a faster pace than the phone industry at the moment, but at least from a financial perspective I would expect it to slow down sooner rather than later.

toyetic · 2026-05-28T19:56:47 1779998207

This was my exact thought as well. I think mythos could still be a huge leap but especially as IPO's get closer it seems like we're getting closer to the IPhone 10 moment where anything after is just improvements at the edge.

But ( maybe because it was hardware ) that took 10ish years while it seems like the slowdown here only took about 4

slashdave · 2026-05-29T05:40:03 1780033203

Are we supposed to have two cars?

pbmango · 2026-05-20T21:54:59 1779314099

Anthropic is getting capacity from Colossus 1 not Colossus 2 it sounded like. The initial colossus capex was under $5B, making that an even more astounding payoff.

Edit: S1 states both are being leased so the 20-25B initial investment probably more relevant

TheAlchemist · 2026-05-20T21:57:34 1779314254

The S-1 states that it gets capacity from both Colossus 1 and Colossus 2.

gjsman-1000 · 2026-05-20T21:55:53 1779314153

... and a sign Anthropic couldn't find enough compute anywhere else, so they had to bite the bullet. Interesting.

pbmango · 2026-03-03T19:14:29 1772565269

I imagine a huge proportion of their users are under 30. The prompt examples included even use the tell tale all lowercase (though apparently sama types like this too).

This is probably less pandering to genz and more speaking their users language.

pbmango · 2026-02-24T03:24:19 1771903459

This is very interesting. I don't see much discussion of interpretability in day to the day discourse of AI builders. I wonder if everyone assumes it to either be solved, or to be too out of reach to bother stopping and thinking about.

yogurt-male · 2026-02-26T18:20:10 1772130010

Mostly out of reach. There is a ton of research on figuring out how to do this coming out every day, including both proposals of new ways to do things and (often strong) critiques of old or recently proposed ways of doing things. Interpretability (esp. for large, modern models) is very, very far from being a solved problem.

adebayoj · 2026-02-24T08:22:07 1771921327

Most interpretability techniques haven't yet to be shown to be useful for everyday model pipelines. However, the field is working hard to change this.

pbmango · 2026-02-19T18:55:42 1771527342

Along these same lines, I have been trying to become better at knowing when my work could benefit from reversion to the "boring" and general mean and when outsourcing thought or planning would cause a reversion to the mean (downwards).

This echos the comments here about enjoying not writing boilerplate. The there is that our minds are programmed to offload work when we can and redirecting all the saved boilerplate to going even deeper on parts of the problem that benefit from original hard thinking is rare. It is much easier to get sucked into creating more boilerplate, and all the gamification of Claude code and incentives of service providers increase this.

pbmango · 2025-07-08T16:22:55 1751991775

As the founder of another product in this space - this is super impressive and well built. Great demo video and congrats on top of HN! Getting this smooth UX and data behind the scenes is not easy.

pbmango · 2025-06-12T18:34:18 1749753258

https://www.canva.com/design/DAGqKquGD-c/xtRObgH1r_4RoulPAys...

pbmango · on April 28, 2025

It is also possible that this "world view tuning" may have just been the manifestation of how these models gained public attention. Whether intentional or not, seeing the Tiananmen Square reposts across all social feeds may have done more to spread awareness of these models technical merits than the technical merits themselves would have. This is certainly true for how consumers learned about free Deepseek and fit perfectly into how new AI releases are turned into high click through social media posts.

refulgentis · on April 28, 2025

I'm curious if there's any data to come to that conclusion, its hard for me to do "They did the censor training to DeepSeek because they knew consumers would love free DeepSeek after seeing screenshots of Tiananmen censorship in screenshots of DeepSeek"

(the steelman here, ofc, is "the screenshots drove buzz which drove usage!", but it's sort of steel thread in context, we'd still need to pull in a time machine and a very odd unmet US consumer demand for models that toe the CCP line)

pbmango · on April 28, 2025

> Whether intentional or not

I am not claiming it was intentional, but it certainly magnified the media attention. Maybe luck and not 4d chess.

pbmango · on April 14, 2025

I think an under appreciated reality is that all of the large AI labs and OpenAI in particular are fighting multiple market battles at once. This is coming across in both the number of products and the packaging.

1, to win consumer growth they have continued to benefit on hyper viral moments, lately that was was image generation in 4o, which likely was technically possible a long time before launched. 2, for enterprise workloads and large API use, they seem to have focused less lately but the pricing of 4.1 is clearly an answer to Gemini which has been winning on ultra high volume and consistency. 3, for full frontier benchmarks they pushed out 4.5 to stay SOTA and attract the best researchers. 4, on top of all they they had to, and did, quickly answer the reasoning promise and DeepSeek threat with faster and cheaper o models.

They are still winning many of these battles but history highlights how hard multi front warfare is, at least for teams of humans.

spiderfarmer · on April 14, 2025

On that note, I want to see benchmarks for which LLM's are best at translating between languages. To me, it's an entire product category.

pbmango · on April 14, 2025

There are probably many more small battles being fought or emerging. I think voice and PDF parsing are growing battles too.

oezi · on April 15, 2025

I would love to see a stackexchange-like site where humans ask questions and we get to vote on the reply by various LLMs.

anotherengineer · on April 15, 2025

is this like what you're thinking of? https://lmarena.ai

oezi · on April 15, 2025

Kind of. But lmarena.ai has no way to see results to questions people asked and it only lets you look at two responses side by side.

kristianp · on April 14, 2025

I agree. 4.1 seems to be a release that addresses shortcomings of 4o in coding compared to Claude 3.7 and Gemini 2.0 and 2.5

pbmango · on Feb 25, 2025

Growing up in Buffalo New York, I only once as a kid saw one flying while on a camping trip in a remote state park. Now, you see one almost every day on the coastline of lake Erie. They are so much bigger than other birds that you will notice even if you are not on the lookout. Their scale is astounding compared to sea gulls.

They have also come back to the Potomac and Washington DC which is nice.