Hacker Newsnew | past | comments | ask | show | jobs | submit | pbmango's commentslogin

I can't help but think of Iphone updates since about 2018. The thinnest, fastest, longest battery life Iphone ever. It seems mostly the same and I probably won't be able to tell other than the name, but everyone buys it anyway.

This is good psychology for the labs. When Buffett invested in Apple he loved citing how most people would rather give up their second car than their Iphone.


This in incredibly refreshing take, thank you. It's about time someone admitted that we aren't on the verge of Singularity with these LLMs. We've probably hit a local AI maxima here and it could be another 10 to 20 years before we am get another big break through.

ChatGPT came out in 2022. Back then it was just a chatbot. Now we have AI agents. What matters is how we use them and how the agents get better. That’s what will move AI forward.

An 'AI agent' is just a chatbot that is told to type commands on a REPL-like interface as part of its system prompt. It's still processing pure text-based requests and responses, they're just not restricted to natural language.

A lot of people dont know this , also the chatbot (chatgpt) itself is a next token predictor (the GPT) that's been given an initial text that says " pretend to be a chatbot .." and asked to complete it , the coherant chatting behaviour is something thats emergent .

later on someone figured if you asked it to output a reasoning before it gave a response its output would have more logical coherence, as though the reasoning output tokens functioned as a scratch space for it to work on.

at the end its all next token prediction


No, chatbots are LLMs trained for question-answering through RLHF (its not just a prompt). But yes, if you just zero-shot prompt a bare LLM you can still "talk to it" & you are correct on everything else as far as I know.

At lot of people don't know this, also the human brain is a squishy lump of meat. that's been given a childhood and the prompt "act like an adult", and asked to behave. The coherant chatting behaviour is something thats emergent .

later on someone figured if you shove Adderall in it and it to think before it speaks, it gave a response its output would have more logical coherence, as though the Adderall concentration drugd functioned as a scratch space for it to work on.

in the end its a squishy lump of meat.


We know from living as humans that we have experiences.

We have no such evidence that LLMs do.

That's a pretty significant difference between the next-token predictor and the squishy lump of meat.


How much must one tie their self-worth to a chatbot to debase themselves like that? To think that a winner in the arms and intelligence race of animal kingdom, a member of the species that made this chatbot, would put down themselves like that in the defense of the thoughtless silicon is absolutely laughable and depressing at the same time.

I'm merely pointing out the logical fallacy of thinking complex systems can't arise from simpler components in an obtuse fashion. Ants are stupid individually, yet they're able to create giant structures in the wild. Hating on AI and calling it next word prediction isn't going to save anyone's jobs. Organizing will. Voting will.

They are chatbots trained for tool use, its not just a prompt.

An AI agent and a chatbot are both applications built using LLM inference as a primitive.

Yeah and a car is just an engine connected to wheels.

Yeah. LLMs are fundamentally a batch-based system, and we smear a veneer of liveness and autonomy on top.

Not even 4 years old yet. This tech curve has been insane

I still use LLM in quite similar way as when ChatGPT was launched. There has been progress but I think the real leap was 2020-2022.

Not even the typical lifecycle of a corporate PC or laptop. It is pretty wild.

If you upgrade your 8 year old phone the many incremental upgrades will be very noticeable. From my personal experience the LLM space is also moving at a faster pace than the phone industry at the moment, but at least from a financial perspective I would expect it to slow down sooner rather than later.

This was my exact thought as well. I think mythos could still be a huge leap but especially as IPO's get closer it seems like we're getting closer to the IPhone 10 moment where anything after is just improvements at the edge.

But ( maybe because it was hardware ) that took 10ish years while it seems like the slowdown here only took about 4


Are we supposed to have two cars?

Anthropic is getting capacity from Colossus 1 not Colossus 2 it sounded like. The initial colossus capex was under $5B, making that an even more astounding payoff.

Edit: S1 states both are being leased so the 20-25B initial investment probably more relevant


The S-1 states that it gets capacity from both Colossus 1 and Colossus 2.

... and a sign Anthropic couldn't find enough compute anywhere else, so they had to bite the bullet. Interesting.

I imagine a huge proportion of their users are under 30. The prompt examples included even use the tell tale all lowercase (though apparently sama types like this too).

This is probably less pandering to genz and more speaking their users language.


This is very interesting. I don't see much discussion of interpretability in day to the day discourse of AI builders. I wonder if everyone assumes it to either be solved, or to be too out of reach to bother stopping and thinking about.


Mostly out of reach. There is a ton of research on figuring out how to do this coming out every day, including both proposals of new ways to do things and (often strong) critiques of old or recently proposed ways of doing things. Interpretability (esp. for large, modern models) is very, very far from being a solved problem.


Most interpretability techniques haven't yet to be shown to be useful for everyday model pipelines. However, the field is working hard to change this.


Along these same lines, I have been trying to become better at knowing when my work could benefit from reversion to the "boring" and general mean and when outsourcing thought or planning would cause a reversion to the mean (downwards).

This echos the comments here about enjoying not writing boilerplate. The there is that our minds are programmed to offload work when we can and redirecting all the saved boilerplate to going even deeper on parts of the problem that benefit from original hard thinking is rare. It is much easier to get sucked into creating more boilerplate, and all the gamification of Claude code and incentives of service providers increase this.


As the founder of another product in this space - this is super impressive and well built. Great demo video and congrats on top of HN! Getting this smooth UX and data behind the scenes is not easy.



It is also possible that this "world view tuning" may have just been the manifestation of how these models gained public attention. Whether intentional or not, seeing the Tiananmen Square reposts across all social feeds may have done more to spread awareness of these models technical merits than the technical merits themselves would have. This is certainly true for how consumers learned about free Deepseek and fit perfectly into how new AI releases are turned into high click through social media posts.


I'm curious if there's any data to come to that conclusion, its hard for me to do "They did the censor training to DeepSeek because they knew consumers would love free DeepSeek after seeing screenshots of Tiananmen censorship in screenshots of DeepSeek"

(the steelman here, ofc, is "the screenshots drove buzz which drove usage!", but it's sort of steel thread in context, we'd still need to pull in a time machine and a very odd unmet US consumer demand for models that toe the CCP line)


> Whether intentional or not

I am not claiming it was intentional, but it certainly magnified the media attention. Maybe luck and not 4d chess.


I think an under appreciated reality is that all of the large AI labs and OpenAI in particular are fighting multiple market battles at once. This is coming across in both the number of products and the packaging.

1, to win consumer growth they have continued to benefit on hyper viral moments, lately that was was image generation in 4o, which likely was technically possible a long time before launched. 2, for enterprise workloads and large API use, they seem to have focused less lately but the pricing of 4.1 is clearly an answer to Gemini which has been winning on ultra high volume and consistency. 3, for full frontier benchmarks they pushed out 4.5 to stay SOTA and attract the best researchers. 4, on top of all they they had to, and did, quickly answer the reasoning promise and DeepSeek threat with faster and cheaper o models.

They are still winning many of these battles but history highlights how hard multi front warfare is, at least for teams of humans.


On that note, I want to see benchmarks for which LLM's are best at translating between languages. To me, it's an entire product category.


There are probably many more small battles being fought or emerging. I think voice and PDF parsing are growing battles too.


I would love to see a stackexchange-like site where humans ask questions and we get to vote on the reply by various LLMs.


is this like what you're thinking of? https://lmarena.ai


Kind of. But lmarena.ai has no way to see results to questions people asked and it only lets you look at two responses side by side.


I agree. 4.1 seems to be a release that addresses shortcomings of 4o in coding compared to Claude 3.7 and Gemini 2.0 and 2.5


Growing up in Buffalo New York, I only once as a kid saw one flying while on a camping trip in a remote state park. Now, you see one almost every day on the coastline of lake Erie. They are so much bigger than other birds that you will notice even if you are not on the lookout. Their scale is astounding compared to sea gulls.

They have also come back to the Potomac and Washington DC which is nice.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: