Hacker Newsnew | past | comments | ask | show | jobs | submit | qsort's commentslogin

It's almost certainly a reference to Lovecraft actually:

https://en.wikipedia.org/wiki/Cthulhu_Mythos

Hopefully future models will be kind enough not to behave like malevolent gods.


The word mythos means roughly the same as "myth" and dates to 1753.

Why do you think that?

If you're a "React person", as the article puts it, friendly reminder that you can render components to HTML and serve that to the user.

I have done exactly that on a project that was under similar constraints. The UI models live in .tsx files and the browser gets pure HTML with zero JS by default.


These are the results from the website they link in the paper:

https://math.sciencebench.ai/benchmarks

I take the "2 unsolved" claim to mean "not solved by any model in any configuration in any stage with any number of attempts", the "benchmark results" are much lower. To be clear: it's extremely impressive, I still remember I was in utter disbelief when models started solving AIME problems, and this is obviously several levels above that.

It's also interesting that OpenAI models perform that much better on math and math-adjacent stuff. I assume this comes down to differences in post-training?


If you're trying to compare what the models are good at, important to note that the different models did not run with the same settings. In one case they also retried with GPT until it answered all the problems but did not retry with the other models.

GPT has 5 effort settings and they picked the highest (xhigh). Claude has 5 and they picked the middle one to avoid having to retry when it timed out. Gemini has medium or high effort and they picked medium.


the difference between gpt and gemini concerning the "retry until..." can almost be ignored. I did rerun gpt a few times, but still way below what gemini was not able to answer at all.

Look, I've never been someone who mindlessly hypes AI companies, as a matter of fact I think they have serious leadership problems across the board, but you people are straw-manning them so badly it actually makes me sympathize with them.

They aren't saying they have fully automated luxury AGI, they specifically list the ways models fall short of that bar and caution against people taking the 8x figure as the actual uplift number. At the same time they recognize that 80% of new code is now AI-authored, when two years ago those models were little more than toys. And frankly that checks out: if two years ago you told me we'd have something like Opus 4.8/GPT 5.5 I would have rolled to disbelieve.


> At the same time they recognize that 80% of new code is now Al-authored

I can setup a loop that will write a trillion lines of code automatically, how much of it is actually useful? Or are we back to counting LoC because there's no other metric for these systems that anyone can rely on?


It's 80% of new code they shipped that is AI authored.

Would you ship pointless code?

I do tend to agree though, it could be that AI solves problems with more code than a human would. What you need to measure is the value the code brings and how much of that is done by AI, hard to get an objective measure of that though.


> Would you ship pointless code?

I wouldn't, no. I don't see evidence that the engineers at Anthropic are similarly cautious however. They describe Claude Code as "basically a game engine" when it's literally a TUI app, and it eats memory for no apparent reason. I fully believe that Anthropic would ship pointless and garbage code. Especially if it's being written by LLM.


I could write a bash script that copies a codebase repeatedly in the pre-AI past as well, but I didn't do that because I wasn't stupid. More than 80% of my code is now AI-generated, and trust me I'm still not stupid. It was 0% only a year ago.

Who says LoC is the only metric we should rely on? A software product should first and foremost meet user requirements, functionality and performance. Judging from the sensational rise of Anthropic's user base and revenue I think we can safely says they're in that ball pack.


I'm dumb as a rock and I don't have a PhD, but since ~1 year ago I started forcing myself to do small bits of coding and math manually.

I'm not noticing a "cognitive decline" per se, but I do see I'm a lot "lazier", even stuff that used to be routine when I started coding now feel heavy.


>I'm not noticing a "cognitive decline" per se

The funny thing is, maybe not noticing one can be the actual sign of it :)


Yes, precisely. Assessing your own cognitive skills is dubious. I’m pretty certain I’m less clever than I was when younger but if I find a problem tough now maybe 25 yo me would also have struggled?

That’s the most important thing. If we keep reading, maybe we can hold our own.

>even stuff that used to be routine when I started coding now feel heavy.

The same weight feeling heavier is a sign that your muscles are weaker :)

There's many areas in life were we look back a few decades and think "people use to do it that awkwardly?" And yet results were better. I think the process of removing friction have just served to destroy our ability to concentrate and tolerate difficulty.


> but I do see I'm a lot "lazier", even stuff that used to be routine when I started coding now feel heavy.

Not getting that quick dopamine hit the LLMs give you..

Some say you can re-train your system to get back the dopamine hits you used to get from other things, like the enjoyment of the "old fashioned" manual coding and math. Getting there is hard work. And YMMV.


I just do things manually and ask LLMs to check my work. That seems to be working great for me.

I had the most Russian of Russian bosses when I was in college. My first day on the job he so eloquently stated, "I am not your mother. Do not come to me with problems. Come to me with solutions. I want to know what you tried and what did not work."

His advice has served me well in many areas of life too. I try my best to treat LLMs no differently for domains I care about (not one-off little questions here and there).


What I would like to do is double model post-check, with a form of "debate", to better catch edge cases.

Unfortunately, I haven't found a way to set that up as I envisioned it, for the time being.


Absolutely this, I'm the same as you.

And I'm just afraid this is what cognitive decline feels like from inside the deteriorating mind.


“ I'm not noticing a "cognitive decline" per se, but I do see I'm a lot "lazier"”

These are correlated - it just hasn’t happened in a large enough amount for you to have clearly noticed it yet.


I do a similar version of this, where if I notice a mistake in generated code, I fix it manually (or at least attempt to) instead of telling Claude to fix it.

This is the right balance for me as well.

I use an agent to generate a first-pass attempt, and then (deadlines willing), I manually read every line at least once so I understand what the code actually does.

Then I manually fix the inevitable slop that is mixed in with the good stuff, and only once the code is up to my personal standards do I send it.

This probably reduces my “AI performance boost” to 30-50% instead of the huge gains reported by others. But I retain the ability to reason about the codebase and use AI much more precisely when I’m trying to troubleshoot production outages or subtle bugs — something I notice the rest of my team struggles with, since adopting “agentic workflows” everywhere.

I think actively working to retain some cognitive flexibility and “muscle memory” around coding tasks is going to be rather advantageous in the long run.


Pure copium, but what can you do with the deadlines.

Same, but also because it feels like it takes longer for an LLM to do it. I think that's something people who are into gathering personal metrics should do - measure how long it takes to type a prompt / have the LLM fix things vs just doing it yourself.

LLMs are making me smarter. I have more code to read!

The object-level discussion is interesting, but I disagree with the premise to such an extent it feels like a moot point. It feels like the article doesn't play out the line to its logical conclusion.

Why would agents want GUIs made for humans? It's already the case that, like everyone who's good at computers, agents want a terminal and good APIs, not some ad-ridden crap.

If anything, AI is a reason why it will never be the year of the linux desktop but also it doesn't matter anymore, because if the higher-order bit of productivity is defined by AI, then my tmux+vim is as good as your Visual Studio.


  > tmux+vim is as good as your Visual Studio.
You probably don't need tmux. The utility is really when you're remoting into machines and want to keep your session (or are too lazy to use nohup or disown)

Your terminal should split panes for and do tabs. Ghostty is my preferred but use whatever. And fwiw, even if your terminal sucks vim can do this all for you too (:term), so you don't even need to leave vim.

  > vim is better than your Visual Studio.
FTFY ;)

Side note: just because you can live in the terminal on Linux doesn't mean GUIs can't exist or are even second class citizens. The real beauty is being able to have both. You can have a platform that is usable for most people while not fucking over power users. Wild concept, I know


<strongly offended noises>

Everybody needs tmux, especially locally.

My terminal splits panes (which I don't use), but what if I want to open two terminals that share the same set of splits? Can't. But tmux can!

What if I want to SSH back into my desktop (because I'm on a laptop or whatever) and grab something from my desktop terminal? Can't. But tmux can!

Vim splits and the vim terminal are poorly implemented. Technically, yes, they work. But you'll run into a lot of issues. I know, because a few years ago I went down the same path: Why do I need tmux, when I have vim!? ... I quickly learned why I needed tmux.

I agree with your side note: plasma+kitty+tmux and a few support scripts:

(please don't criticize my scripts; these were never meant to be shared, and it's a disaster, but it works for me)

I have this script (https://doc.xn0.org/tmuxedkitty-newwindow.sh) bound to WIN+T; it opens kitty, and either creates a new tmux session if there isn't one or attaches to the existing session and creates a new pane.

Then, I have my insane (I understand I am insane, but it works for me!) tmux config file: https://doc.xn0.org/.tmux.conf

Then, I have my insane zshrc that auto-titles my tmux windows: https://doc.xn0.org/.zshrc

Using titles from: https://doc.xn0.org/tmux-window-titles

I have put way too much thought and time into this...


  > but what if I want to open two terminals that share the same set of splits?
You want clones? I'll admit most terminals can't do this (some can), but I'm struggling to see the use case. What's the advantage of having 2 windows displaying the same information?

  > What if I want to SSH back into my desktop
Agreed! That was the explicitly stated usecase where I said tmux was for[0]

  > Vim splits and the vim terminal are poorly implemented.
Completely fair and I avoid for exactly those reasons. But they're still handy in a pinch and they're good to know about

BUT tmux is also poorly implemented. Start trying to use sixel (or kitty graphics) in your fzf previews, yazi, or whatever you're displaying things with. This is a big pain point.

  > please don't criticize my scripts
Do you want friendly comments? All code sucks so I'll not going to call you dumb or anything. But do upload somewhere so I don't have to download 0x0.st is perfect for this usecase.

  > Using titles
Your terminal doesn't do titles? What terminal are you using?

[0] I'll also admit Claude code is another use case. But that is because it is so poorly written not because the terminals suck. I absolutely believe Dario when he says Claude does most of the coding... it shows...


Just because tmux doesn't work for you doesn't mean it can't be useful for someone else. I for one really appreciate having the same interface and keybinds across several devices and I've never felt a need to look elsewhere.

  > having the same interface and keybinds across several devices 
I'm a bit lost. I use my dotfiles for this.

If it is a machine I control: I control the terminal so there's no issues.

If it's a machine I'm sshd into: that's my explicitly stated tmux case right there.

If it's a machine I don't control: well I can't do anything anyways, so conversation is moot. This situation is exceptionally rare though (where I can't even do local installs)

I agree that you should use what works for you, but I'm curious what you're getting that isn't already offered by your system


I'm with you that people are insanely hyped about Claude Code in particular when e.g. Codex isn't far behind (and with recent models I actually prefer it).

But I'm going to need a citation for this:

> a lot of GenZ and young Millenials who were already bitter at their employers have used the tokenmaxxing push to sabotage the AI

The 3 people on reddit doing this don't even register on a company budget. What seems more plausible to me is that budgets were calibrated to spending before agents were actually useful, and late '25/early '26 changed the pattern significantly.


Codex is actually significantly better than Claude Code now, assuming you have a clear idea of what you want to do and how. Claude's secret sauce is that it'll run off and do stuff that's mostly right without a lot of prompting, but that also makes it willful/disobedient and causes it to be bad for "finishing" work, since it'll circle around your objective in an opinionated way.

https://finance.yahoo.com/sectors/technology/articles/nearly...


Hey, would you mind elaborating a bit on this:

> assuming you have a clear idea of what you want to do and how

I mean, if I have a sufficiently clear idea of what and how, then surely just coding it manually would work significantly better. Unless maybe I am a painfully slow typer.

Without some level of "actually I'm not sure exactly" permitted, then I'm not really sure what LLMs bring to the table.


Even when you have a clear idea of what you want, there are still hundreds of decisions you need to make while building it, both big and small. Everything from what to name your database tables and columns to what data structures are optimal and what the API payloads should look like and what the tech stack should be. Anyone with a sufficient level of experience in this field has made these types of decisions dozens of times and at some point it becomes more practical to have an AI do it for you and for you to quickly skim it.

For example I want to make it so that users receive an email when their password is changed. I can either do it myself, which requires reviewing and remembering code I’ve written five plus years ago and then wiring everything up and obsessing over the wording of the email. Or I can give a two sentence instruction to the AI, work on something more meaningful while it is doing its thing, and then test it in under 60 seconds when it is done.


If I want to create a web app with a back-end, database, and some services, and I tell codex to do that with a specific stack and using specific paradigms to keep the code performant and maintainable, it's still a win over coding it by hand, as models can emit ~200char/sec compared with maybe ~10 for a really fast human. There's up front planning cost, and you will have to go back and massage some of the outputs a little bit if you're particular, but for sizeable tasks it still comes out to be a big win.

If you're just working on a single react component or an algorithm to do stuff with data, there's less chance to amortize the up front planning and verification so it comes out more of a wash.


I don't know what this table is supposed to measure but it doesn't check out.

(C, C++) and (JS, TS) are almost source-compatible, chances are you can rename test.c to test.cpp and test.js to test.ts and you're done. Yet they're showing massive differences?

Also most of the compiled languages with no runtime should get results that are very close to each other: good compilers should produce similar object-code for this type of microbenchmark.

Not to mention this is really measuring the implementation, you can't measure a language. Mike Pall wouldn't be down there, and JS/TS wouldn't be up there without V8 and friends.


I imagine you would want to test "idiomatic" code for these comparisons. It doesn't make much sense to compile with C++ and write everything in C.

That doesn't explain why Typescript is insanely less efficient.

I truly don't get Google's move.

I'm sure the model is fine, but it's not Google Search, and when I want Search I want Search. If I wanted to ask an AI, why can't I ask the one from my subscription... that I'm already paying for... that's actually good... that can also search the web?

I assume it's a play to test the waters for how the ad market is going to work, because as a product I really can't see why I would ever use it. Dropbox comment moment incoming?


They want to capture more of the value that was previously going to others. That's basically what this has all been leading to. Why let a cooking website get visitors and ad revenue when they are free to take the content and show it as their own? Now they are going to do the same to e-commerce. Either they are going to let customers buy their products through Google's interface, or they won't be discovered. No more ownership of the customer relationship. Stores will be a backend warehouse and manufacturer now with Google taking a percentage of all profits.


> Why let a cooking website get visitors and ad revenue when they are free to take the content and show it as their own?

I think this is a step beyond that - why should people be creating cooking websites when you can ask LLM how to cook given thing, while indeed, serving their own ads. It's the continuation of "we own content other people produce" policy


Google already killed cooking websites - when it refused to show them in search unless they added long slop content to it. And it killed blogosphere when it decided blogs wont be found if they just contain content without deliberate SEO play.

And I think the rest of it will end the same way. People will be significantly less eager to do all that free work when no one will be able to find it.


recall the pizza sauce glue trick, to stop cheese from sliding off.

there are other such goodies like mashed potatoes with broken lightbulb gravy, or fiberglass omelette, enjoyed by beldar conehead.

i wouldnt trust an AI for any recipe that i dont have personal experience with.

the safety rails are not very strong yet.


If you are half decent at cooking it is actually pretty helpful to explore cooking something new. Just like coding it is nice to get specific answers to your specific question and it is pretty easy to reason about the quality using your own experience.


I would be interested in an example of this. LLMs will often combine recipes from random sites. If you're experienced enough at cooking to reason about the quality _for something new to you_, what value is there in an LLM here? I don't see any similarities to coding here.


To me the similarity is I know exactly what I want to do but cannot really remember syntax (coding) or key variables (cooking) like temp and time. But I have enough experience to know if the output makes sense. Either one I can ask an llm a specific question and get a somewhat reliable specific answer that I feel comfortable parsing… this is actually one of the reasons I think I am eventually going to be on the local inference bandwagon. It is not far from being good enough for my use cases. And I will be able to skip the inevitable enshittification.


In terms of temp and time surely if you know enough to judge it's correctness, you would not need it in the first place? Code correctness is rather objective and easily testable. Cooking is rather subjective and only testable with great effort and time. I just checked 4 models on a 4lb pork shoulder in an oven. Flash was super off, suggesting you could pull at 145-150F for a sliced roast. Yeah, you could and it would fucking suck. The per lb time and total time also didn't add up. The others were better but varied. Only one (opus) thought to ask if it was bone-in. If you're very specific you could certainly have it aggregate a bunch of recipes to get a sense of what's close to a good answer, but ultimately it depends on what sources it chooses.

I could see LLMs being helpful to explore what's out there, like finding similar dishes or dishes involving a specific set of ingredients or dishes involving a particular technique, but a pretty poor tool for the actual technicalities of cooking or more importantly the uniquely personal aspects of food culture.

I dunno. I'd just buy larousse and on food and cooking.


I recently roasted a 5lb leg of lamb. Temp was pretty obvious but I had never cooked meat this way so an idea on time is really useful. Google search is a disaster for this kind of question. And I guess I have never encountered a good general cook book that I feel comfortable building off of.

I think all the science of cooking ones are a good bet for generalist knowledge. Some of the more textbook like ones as well. The food lab and on food and cooking stand out, but there are many others. I'm not sure I'd classify them as cookbooks.

Food lab, for example, covers buying storing and cooking lamb + a guide for a 5-7lb boneless leg across 5 or so pages. Kenji goes through great lengths to build intuition. I'm sure larousse, which is more of an encyclopedia, covers lamb quite extensively but it's probably more terse.

The internet can be an excellent source, but like most things it depends on who is writing it down.


I agree and this response was following OPs example. But the point still stands - the goal is to outsource, in a weird way, the results being served = Google as such wouldn't need to pay for content. Now, if accuracy of such sources doesn't matter (or is good enough) for casual user...


Given most cooking or recipe websites have been AI slop for a few years now......

I'll stick with my mom's handwritten recipe book.


There are virtually no combinations of food which are toxic, you can mix any food with any food and, while it might not be good, it will still be food. (The only exception I know of is alcohol and mushrooms containing coprine, e.g. inky caps)

Point is, unless you're stupid enough to add glue or broken glass to your meal just because a recipe told you to, it's perfectly safe. More than just safe, LLM recipes these days are utterly boring in their normalacy, and, unlike cookbook recipes, can dynamically adapt to what you actually have in your pantry.


What really sucks is that Google pushed actual content creators out of the way in the first place. That is horrible. I think they should be challenged on this. Food bloggers, recipe writers, and creators have helped shape a huge amount of food culture, and they deserve to be protected rather than erased. If this kind of theft continues from the AI industry Im not sure what type of culture is is going to be left or what it is going to replace it to. I hope humanity is going to find a creative way around it, but I’m also aware how easy to manipulated the masses are.


Their assumption is that all relevant culture has already been invented and capturing the status quo is enough to get 80% of the benefits.


Evidently you're not familiar with Swedish Lemon Angels.


You can also tell the LLM exactly what you have in the fridge or what allergies you have and get customized recipes. It’s just a better experience, 2026 is rough for a recipe site.


Would you trust the tool that recommended putting glue on pizza to give you a good recipe?


I have/make rice starch glue. Can you put it on food? How are you supposed to know whether it's food safe?

Okay, so you don't trust LLM, so you go to a website instead. And... LLM-generated pages are SEO'd to get the top links. So you can't trust any website now (shoot, so much nonsense even before LLM, just more obvious to some of us). So basically everything on a computer is untrustworthy, directly from an LLM or not, unless you got yourself a copy of Encarta '97.

So you pick up a book at the local library. Librarians picked some books to order in subject matters they aren't expert in. How do you know those are accurate and safe? If the book says to use rice starch glue, how do you know the author didn't just copy that from an LLM? Or make it up?

Trust is fading entirely.


Presumably you test some things and use common sense for others. Like if you search for "grain filling oak" using an engine like Kagi(because Google just sells you the same product repackaged over and over) then you'll get people telling you variously to buy this grain filler compound that worked on their particular project, or you get people telling you to use drywall patch compound, or watered down wood filler.

The thing is, these things do produce some kind of result that looks like what you want. But it is still up to you to test these things on a project before you rely on them for whatever it is you really wanted them for, and that requirement doesn't go away just because you sourced the information from some LLM, or a book at the library, or Nick Offerman, or whoever else.


Yes.

There isn't a robot straight up putting glue on my pizza. I will be following the recipe myself and since i trust myself to accurately detect that glue shouldn't belong on food i have zero issues with doing this.


Got anything from 2025 or 2026?

AI got better over the last couple of years, and you didn't keep up, and because that's not going to stop, it will eventually become a problem for you.


The fundamental technology is still the same, just with more fossil fuel burning.

> because that's not going to stop, it will eventually become a problem for you.

How? Will it stop being possible to cook without AI?


The fundamental technology is still the same, just with more fossil fuel burning.

That's like saying the fundamental technology behind an Egger-Lohner Hybrid and a Prius are the same. Technically true, but if you use that truth as a basis for decisionmaking, you're doomed. A modern AI model wouldn't make such a foolish mistake, so you'd better not make it yourself.


Current AI models still make mistaKes all the time.

https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:allu5vs...


(Shrug) Ask a free chatbot model and get what you paid for.

Aside from that, please let me know when you find a machine or a human that never makes mistakes. I'd like to invest.


If the user puts glue on their pizza because a computer said so, that's a human problem.

The computer generated recipes can be useful as inspiration, but of course common sense is required.


This "common sense" you refer to, is it the same common sense Babbage was subject to?

"On two occasions, I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question."

~ Charles Babbage


If you freely follow a recipe telling you to put glue on your food, I also don't trust you cooking anything and I definitively don't trust you coming up with your own recipes.

This video tells me otherwise: https://www.youtube.com/watch?v=UDQds7VZkfg ( Cold Ones - We Drank AI's Horrible Cocktail Ideas). This is a tongue in cheek response though, as LLMs improved significantly since then.


> You can also tell the LLM exactly what you have in the fridge or what allergies you have and get customized recipes.

Can you really though? Are the results delicious? I've never tried that.


It is pretty good yeah. I usually ask what traditional recipes (with some time/difficulty limits depending on time availability) might fit with the ingredients i have and let it suggest substitutions.

I wouldn't just ask it to make me a "novel and interesting recipe" giving it a bunch of weird ingredients that don't fit together. It would probably try its best but garbage in garbage out also work for food! At that point i ask it what should i get from the shop to make a recipe with x ingredient that i have in the fridge and i want to finish.

It's also pretty good with meal planning if you do that, it can estimate portions with calories breakdowns etc...


It's worse than you think, many recipe sites do not taste test their stuff at all, and often have very stupid instructions.

That being said, an LLM can give creative ideas, mix and match components, but you should not trust the details at all.


Case in point, when "minced meat" and "mincemeat" were mixed up: https://metro.co.uk/2019/12/09/american-website-includes-act...


Damn, TIL. Now “Operation Mincemeat” seems less macabre.


Is this mushroom edible.jpg


> You can also tell the LLM exactly what

You can - but it's not advisable, not in the least.


It is the same thing as when they pushed for AMP. They wanted to prevent traffic from leaving google.com then too.


In that case at least they could point out that end users got better results with AMP than they do with news sites w/o ad blockers. The AI results are just wrong so often I don't really get it.


> The AI results are just wrong so often I don't really get it

They believe they won’t be wrong for long.


The results are not wrong, they are AI. Google wants that to become a distinct thing that is neither. What's a better answer for Google than one that generates more usage? If we all push in the same direction we can make AI work, we just need to accept we will need to hold its hand for a while.


I think this is sarcastic but man some people really do have some wild defenses for LLM’s so I can’t be sure lol


Maybe it's high time to burn it all down.

Block Googlebot from your sites.

Let's go back to webrings.


It's certainly long been clear that Google is phasing out even the idea that they serve end-users "links" to other websites. They're just refining the idea and making it more and more explicit. It absolutely places them in an obviously adversarial position to every single other website on the Internet, and anyone who continues to cooperate with Google today is probably handing Google the tools to put them out of business. Unfortunately, whole generations of people have grown up learning that the safest and easiest way to navigate to a website is to type some version of the brand into their browser (which Google likely owns outright) and click the first thing Google spits back, so Google enters this battle holding most of the cards :(


Exactly! They also have been letting the results of google search get seriously degraded by ads. Would many people prefer AI over google search circa 2010?

They killed their competition and now they will give you the product that gives them the most money.


Also — it's objectively a better search product to give users what they're looking for right away.

Though that's not to say they're acting altruisticly here.

Google seems to be racing toward a new dark pattern where users learn to trust rely on the AI for neutral, smart objectively correct answers — which boosts trust in its sponsored product recommendations. Super gross.


Why would anyone go to google anymore tho? If it doesn't furnish results it's just a chatbot


I would assume that they've A/B-tested any such important change extensively and basically know that it won't affect their numbers for the worse.


Given my own time at google, I highly doubt these a/b tests are constructed to actually yield a better product rather than push pet products

Then we should take your word over mine. My assumption was that those A/B tests will lead to products that do increase the numbers they were measuring (retention, conversion,...) at the expense of enshittified UX (up to the point of things feeling objectively broken, like notification badges re-appearing for the same items, settings that reset after user changes, search results missing,...). At least that was my explanation for how products by major tech giants like LinkedIn, Facebook, Outlook,... could end up being shipped with such flaws. What would you say?

This has been their MO with their search for a decade+ now. "Native" results hiding actual search results below the fold killed many 2010s era websites that relied on search traffic.


"Greed is bad"


> but it's not Google Search, and when I want Search I want Search.

Not me. I really appreciate having both results simultaneously. I can scan the first couple sentences of the AI response, and if that already has the answer then great. I can expand it to see if there's more.

Or, if I see that the AI mode didn't understand my brief search query, I just glance at the search results below.

And often times, when I do need to follow a link, I find the source result links in the AI mode to be a better quality than the search result links.

It's the best of both worlds.


> I can scan the first couple sentences of the AI response, and if that already has the answer then great.

But how do make the determination that the answer is good and you should stop reading the page? Vibes?


I think it depends on what you are looking for.

Most of the time I'm looking for something very specific that there are plenty of articles about, but clicking on the articles results in popups, banners and an unhealthy amount of scrolling to get to the answer.

AI overview provides me the answer instantly.

Think about suff like "does china borders afghanistan". In those cases you can be confident that the AI overview is right, and saved you time.

If it is a complex or niche question I tend not to trust the overview and go straight for legitimate-looking results


Popups, banners? What are those?


How do you make it without AI ? Are you parsing through millions of pages yourself ?


The LLM results are presented confidently and succinctly in a way that is designed to tell you “yes” OR, it not applicable, it just mashes together statements (which often leads to a response that contradicts itself one sentence later). That’s not the same as your vetting search results.

Well before Google screwed it all up there used to be some correlation between top hits and what you were looking for. SEO has muddied the waters for many years now and it’s never been truly “merit based” or “objective” or whatever we want to call it, but generally speaking, the first results were the best by default.


That hasn't been my experience. It has been working really well for what I need.

SEO optimization totally ruined google search for me for the past few years


>that hasn’t been my experience

Ok, but it’s been mine. And clearly I’m not alone.

I feel like at this point any discussion about LLM’s has an implied “my experience” because LLM’s are super inconsistent due to not being refined tools at all. I’m sure your experience has been different, just like my experience has been different. I imagine you’ll want to chalk it up to operator error, but it sure seems like a lot of people have variations of my experience. If so many people are operating it wrong, then maybe the tool is poorly designed.

Understand that I use LLM’s pretty frequently. I am not “anti-AI.” I’ve used production tools incorporating machine learning for years now. But LLM’s simply aren’t the bespoke tools that these companies want you to believe, and they are definitely not a suitable replacement for search. It’s simply too inconsistent and will hallucinate answers. Google search didn't make up answers, it presented indexed sources that you ver in real time which I find to be a far superior way to do research. I don’t like having to guess when an LLM is just making shit up as it asserts something with simulated extreme confidence. Not only that, you can take a correct answer from a LLM and just start saying “know that is not right,” and it will start apologizing to you and generating other answers. That is a huge problem! I shouldn’t be able to “convince it” to give me a different answer.

Yes SEO made things objectively worse. Doesn’t mean we need to add another layer of issues on top of that.


No, you engage in what appears to be the lost art of media literacy and abrogate high quality sources.


Right, so I have to do manual work going over 10000000 of results ? Or trusting SEO / google algorthm instead ?


No... When did you start using the internet?


great question, probably around 1998 ! How is it relevant to 2026 ?


You seemed confused about how to use a search engine combined with media literacy, thinking you'd have to parse 10,000 results.


Before AI people got the answer they needed from the snippets. That's the level most search queries are at.


Common sense.

The same way I make the determination as to whether a linked search result is good and I don't need to click on another search result.

It's not like non-AI webpages are inherently more trustworthy or anything. The internet is full of misinformation everywhere, you know?


It replaced some of my most used tools with google search. I used to be able to search "define inoculant" and I would get a definition, synonyms, and even a history of the word usage. Now it's replaced by an often mistaken AI summary. Even "inoculant synonyms" doesn't work.


Hope the answer in the AI response is right!


> I truly don't get Google's move.

Users aren't adopting their AI at the rate shareholders expect, so they now force the adoption at the cost of search.


According to Google, users are adopting it. They say AI mode is the most popular feature they've ever introduced, and is driving an increase in total search queries.

>Just one year after its debut, AI Mode has surpassed one billion monthly users, with queries more than doubling every quarter since launch. As people have realized just how much more Search can do for them, they’re searching more than ever before — so much so that last quarter, we saw queries reach an all-time high.

>Another place where we’ve been rapidly innovating is in the Gemini app. Last year at I/O, the Gemini app had 400 million monthly active users. Today, we’ve surpassed 900 million, more than doubling in a year. In that same time, daily requests have grown over seven times.


Isnt this essentially just slight of hand? Google basically defaults to AI search now doesn't it? So of course it will be 'fastest adopted' it's what is shoved in peoples faces.

If the results are garbage, or people have difficulty with it... Of course number of searches goes up. That doesn't mean the product is better or its not resulting in brand damage.


Don't believe your lying eyes, AI results are better!


These are the same folks that removed the very useful Google cache feature because people weren't using it any more. What they forgot to say is they hid the feature beforehand.

Of course they have more AI queries every day. They have full control over what goes to LLMs and what doesn't.


Really smells like some high-ups' bonus was tied to these KPIs and they're guaranteeing that they can't lose.


While I'm not opposed to the idea that Google AI mode is so good that people use it more, I also feel like the average person only have so many queries per day. Google statement would indicate that people had a number of queries that they just opted to ignore, because find the answers was to cumbersome.

I'm not entirely sure I'm buying that, unless users keep prompting the AI to reduce the amount of reading they need to do. Sort of interrogating the AI, rather than reading a Wikipedia page.


The fact that users are using more search queries means they can't find what they want with a lesser number of queries. It seems that Google's PR team doesn't have an incentive to understand that, or thinks that everyone else is stupid.


My guess is that they are spinning it as "users enjoy talking to the AI instead of searching, so they do it more"

Rather than "users don't find what they want with the AI as easily so they have to spend longer with it"


AI mode isn't for queries, it's for questions. You ask it direct, specific things like 'how do I do <x> in <y>' and it provides a fast answer.

People have many more questions in their life than they do queries.


In programming forums like Hacker News people are incredibly detached from the average experience with technology, sometimes it is buffling.

Most non technical people I know asked questions to Google even before the AI overview. Instead of looking for the answer in seo-bloated articles, they find it in the overview.

I think google should improve in detecting the kind of query when I need a link that I don't remember, and deactivate the overview on those. If I search for "ryanair booking" I clearly need the url for booking a Ryanair flight, AI overview is useless


Relevant xkcd: https://xkcd.com/1497/

If programmers and engineers are saying "why would anyone want that?", odds are the product will be a gigantic success.


I mean, "AI Mode" is the default result when you Google something, so of course they're seeing high usage. Driving an increase in total queries is probably because instead of just Googling something and getting the right results like it was 10~ years ago, now you have to interrogate a chatbot or try multiple queries. I would think higher total queries is more an indicator that your search function isn't effective.


"I mean, "AI Mode" is the default result when you Google something"

No, it's not. AI mode is something you have to select (in the search window). There is an AI overview provided with your basic search results.


I agree with their assessment that '"AI mode" is the default' - https://ibb.co/Pz9LqKRb.

That's what I get, in the UK, logged out of Google, from a search in Firefox omnibar using "Google" as provider.

I'm aware that they have other things that can be described as AI modes.


That's AI Overview, just like it says at the top of the box.

AI Mode in that screenshot is the tab to the left of All.


> Driving an increase in total queries is probably because instead of just Googling something and getting the right results like it was 10~ years ago, now you have to interrogate a chatbot or try multiple queries. I would think higher total queries is more an indicator that your search function isn't effective.

I wonder how much the search results thing is related to language and locality. I have a hunch but I haven't really dug into it.

I live in the US, I speak English, and my browser is normally chrome.

The number of times I've gone to the 2nd page in Google search results you can probably count on one hand in the last 15yr or so.

I use the standard Google search things when I want specifics... Using quotes, site:news.ycombinator.com to search a site, or add a "-" to remove results from that site. I use a "+" when needed. Nothing fancy.

When people say they can't find things in Google search, I'm genuinely baffled. I have a strong suspicion that it has something to do with the combination of browser, locality, and language. Why? Could be tons of reasons for that, some probably anti-competitive on the browser side.

I have tried to use ecosia, start page, duckduckgo, etc. Was never happy with those results and always ended up back at Google search.

I just want to know what's different, you know? I look up some pretty obscure stuff sometimes.

Note: I do normally have my Google account logged in in the browser when doing search, however I have search personalization and history turned off, so that should not be influencing the quality of my search results compared to whatever "baseline" is.


It started when Google made a hard push to improve search for everyday people. They essentially nerfed "expert google skills" to bolster "noob google skills".

Regular people are/were really bad at using google, so google moved towards showing what it thinks you want rather than what you want. They paved over the skill gap between people who understood keywords and word order, and people who just typed in a quasi legible sentence to find something. In doing so though, they killed a lot of skill that people had developed with google for years.

Basically they made the game worse for pros so it could be better for amateurs. I have never heard a non-tech person complain about google getting worse over the years, and they seem to overwhelmingly use AI overviews now too.


I just don't know what I'm doing different, I'm just keyword searching and using a couple of inclusive/exclusive flags.

Was I the frog in the pot and now I'm cooked? I don't feel like in search Google any different from maybe 2005 or so.


How do you get the inclusion/exclusion to work? My last few attempts to use “-x” really didn’t exclude what I expected, and almost all the results had “x”. I have seen massive changes since 2005.


> they seem to overwhelmingly use AI overviews now too.

Hard agree. The only thing I've ever witnessed another person do on Google (this is only an incredibly slight exaggeration) is:

1. Type a 'query' - either a brand/website name or some kind of stream of thought like "dishwasher error 03F" (without quotes)

2. Click or look at the very top thing in the results.

This used to mean 80% of the time they'd click the top ad, 20% the top organic result. Then they started putting non-clickable "answers" in that top spot, which would always be accepted as 'the right answer'. When those appeared, approximately no one would ever click any 'blue links.' These started out pretty reliable because they were just direct extracts from sites like IMDB: "Brad Pitt is 44 years old" etc.

Now it's like 60% of the time an ad, 40% of the time their bargain-basement-model "AI Overview" slop. Either way, approximately all users always just use whatever is on top and ignore everything else.


> Either way, approximately all users always just use whatever is on top and ignore everything else.

Wtf


>"a hard push to improve search for everyday people"

Citation needed. A hard push to change their search offering, sure. To improve it? Well, if by improve you mean 'require more interaction and viewing of more adverts on average before leaving' ...


Again, if you have been on HN since 2009, you are likely on the far fringe of Google's user demographics, which at this point is pretty much "The average human being on Earth".

I would bet all of my money that you never once did a Google search (pre-LLM mania, but maybe even after) that looked like

"What kind of clothing is best for when you are going hiking around the lake, so my feet don't get so cold?"

Sadly, this is how most humans have used a search engine for decades now.


I find that weird assumption. Why would you expect HN people not do such searches? They worked for years.

And you frequently ended up finding a discussion forum with around that question and relevant discussion under it.


Because OG tech nerds would google like

Forum hiking warm socks backpacking trail running

Which was basically a structured programmatic query that activated the old google algo just right.


Fwiw, I do/did plain language searches for the last couple of years, following Google's lead - I think the more natural language searches have only really been in the last 5 years.

I often use terse searches too, mostly when I forget to write it out longhand to satisfy Google - but either way it's getting less wheat with my vat of chaff in the SERPs and several times recently I've had to re-phrase to get anything useful.


this has not been my experience on desktop or Android. did you opt into something? are you accessing via browser search or Google.com?


> driving an increase in total search queries

I search more when I cant find the thing I am looking for. I search less when I find the thing I am looking for.

Second, it takes additional effort to not do AI search.


Yeah, seems obvious that more searches = results are worse. Who would go "Google sure is good nowadays, I'm gonna ask it for more things than I used to"

how many of those queries contain keyword groups such as "how do i get rid of the AI search?"


> They say AI mode is the most popular feature they've ever introduced, and is driving an increase in total search queries.

Technically, all the people who google "how do I disable this shitty AI mode in google" would count as "driving an increase in total search queries."

An easy way to make a feature popular is to force it on everyone. Then you can pat yourself on the back when 100% of your users are using it!


I remember when Internet Explorer was the most used browser. The fact that people were just using it to download Chrome doesn't matter to stats.


That doesn't make sense. Presumably AI search costs more


I think it's a multifold problem and they've chosen bad solutions.

1. To protect ad revenue they make search results worse to increase the number of searches by making people refine their searches. This made people upset because search result quality went down. 2. "AI everywhere!" put them in a panic, so they shoved am LLM into results, hoping it could pick through bad results and give good data to the user. 3. LLMs are expensive to run, so they're using a cheap model.

Cheap model + bad results = abysmal user experience.

There are too many groups with opposed interests fighting. Ad groups wants worse results so people search more (not realizing this just drives users away). Search groups want a better product so they stop losing users, and the AI group is being given a bad name because management is using their worst AI product on search. So the whole experience is just garbage.


>1. To protect ad revenue they make search results worse to increase the number of searches by making people refine their searches. This made people upset because search result quality went down.

Why would this work? Were yahoo and askjeeves sandbagging their results too just so they can get more clicks?


> Why would this work?

Humans are predictable and hate change. For a short while it DOES work, people are used to great results, assume they're not using the the best keywords, and they'll reformulate their searches. For a while. After a while of all searches being not as good as they used to be, people start looking for other alternatives, which is why DDG is seeing an uptick.

It's called enshittification. It's easier than improving a product.

> Were yahoo and askjeeves sandbagging their results too just so they can get more clicks?

No idea.


I don't know how much control Goog has over Youtube despite owning them but I do note in passing they removed dislikes, removed upload dates (apparently?), removed 5 stars. Easier to trick people into ads

The platform has been various kinds of hostile for a few years now


They probably lose a ton of traffic to AI or anticipate that happening. This is a way to keep people on Google search.

Like you, I use both search and AI separately. Even casual, nontechnical users are starting to work like that. Including AI with traditional search results will keep a lot of users from jumping ship in the first place and will help win back users from ChatGPT.

I know a lot of people hate AI - at a minimum, there’s a vocal minority - but the reality is AI is eating search like nothing we’ve ever seen.


I imagine most people aren't actually searching the web these days. They're searching for an answer to a question. They already now the 5-10 websites they use and go to those directly. They're mostly living in walled gardens, streaming services, or Amazon. When they use Google they want an answer and AI provides that.


> that can also search the web?

Slight digression: Claude/ChatGPT/etc all can search the web, but Google's AI already has a local copy of the web. It's much faster because of Google's TPUs, but also because Google has a copy of almost the entire web available locally. I recall others testing this and they observe that Google doesn't actually make HTTP requests to sites it references. It just uses its local cache. That's an advantage that all others seem to lack.

Of course, I agree that when I want search, I want search. But personally I've found if I want an LLM to very quickly answer a simple question, the type of thing all of them would do an equally good job on, I prefer Google's for its sheer speed.


I find it useful, and use it almost daily. Helps answer "how to" questions for working on my house, development or just general questions. If I need more info, I just look at the links or videos which are also right there.

To each their own.


> I truly don't get Google's move.

Because Google wants to kill off its search engine here. It is very clear.

> I assume it's a play to test the waters for how the ad market is going to work, because as a product I really can't see why I would ever use it. Dropbox comment moment incoming?

This assumes that Google search is still a high priority for Google. With their privatized adNetwork, they are trying to get people to trust them, and abuse users via their ads. That is their business model. Google is an adCompany. It stopped being a tech company many years ago already.

Also they control the adMarket for the most part. Just look at youtube.


On the flip side I retrained myself to ask llm questions on my phone or computer browser search bar with the expectation of getting an llm response toy question with no desire to look at anything else.

If I truly want to search I will ignore the llm results, but I like the convenience of a quick llm search that knows "all the things". I get the answer to my question without searching multiple ad-ridden websites (since the ad provider does all the things)


"I truly don't get Google's move."

"AI" gets higher volume of use than search. This was disclosed by Google under oath

More traffic, more usage time, more data collection



Well, if the marketing teams are being told to reach people using AI or something like that, then Google is just playing to their real customers.


The intention is to kill the web in its current form, obviously. If only 1/3 of their users have left, then it is still a win for them in the long run, as they will gain the fraction of content they directly supply to users. Singularity is here and it's spreading faster than a cancer.


I don't see search and AI as fundamentally distinct things. Usually I just want an answer.


Maybe we use search differently, but I very often don't just want an answer, I want to find a website to help me. Maybe it is because I need to do business with a company and need to find their website to interact with them, or maybe I saw a cool site awhile ago that's relevant to what I'm doing now and didn't bookmark it (because I dropped that habit when Google search was good), or want to read the official documentation about a product I bought, that someone already put a lot of effort into making complete enough and digestible to a wide audience... and the LLM responses tend to get in the way.

Like the parent I use good/paid AI when I want an AI response. So, yeah, an omnibox that knows when I want "an answer" and one that knows when I want to find a thing sounds slightly more convenient than switching between two tools, but Google search is not that Omnibox.


If you don't care about the facticity of the answer, AI is less clicks, granted.


I dont think about less keypresses though - google search would let you type two words and get the thing you know you want, an ai search doesn't really fit the mode that old school search folks were using


For the same reason I read a book instead of just the plot summary on the back cover


You really want to read the author's life story when searching for a recipe? Or wade through some content marketing plug for some vacuum cleaner shop in Albuquerque when all you want to do is figure out how to change filters on your vacuum? There are definitely gems on the web out there, but chances are I'm not discovering them via search, and I'd rather get the straight answer from the AI.


All of this stuff is Google's fault in the first place with page ranking shit they built!

So now you're trusting them to provide the cure?


Soon, the internet will be so completely full of AI crap enabled by the mega corps that search will be quite a bit less relevant anywho. Maybe google is trying to front run the demise of the internet that they were supposed to protect?


I thought the same at first, but now I find myself relying on the AI answer (as it is usually reliable) and, also more and more, I continue interacting in the AI mode on the topic that motivated my search in the first place.


They see AI killing the incentive for anyone to produce human-generated content so they're squeezing the last few bucks out of the internet as we know it before it finally goes belly-up.


My read on it is "AI is taking over internet content generation, and we can't filter because we'll end up filtering everything that makes us the most money"


I don't disagree with you, but google search has gone so downhill that I had stopped using it before they moved to the AI approach, which is actually pretty decent.


What if their move is to make AI search horrible so that OpenAI has no moves left here because trust in the product collapses?


> I truly don't get Google's move.

Because the goal is not to provide the best answers.

It's for users to train their AI.


initially, not a lot of people were using gemini

google pushed it into their other products to attract people to AI

there was and still are a decent number of people who haven't really used it, as crazy as that sounds


Bad results keep you on their site longer, increasing ad revenue.


> it's not Google Search

...and it really hasn't been for a good number of years now. I left a while ago when results were all SEO copy pasta blogs this is just a final nail in the coffin.


they ruined search a while ago and they want to stop the bleeding


I'm not at liberty to talk more about the details, but last year I worked on a project to modernize a process that critically relied on a VBA macro to handle billions (yes, with a B).

> they run in such a sandbox

What makes them interesting is that they can talk with the outside world: API calls, databases, the terminal named after a former Democratic primary candidate...


> critically relied on a VBA macro to handle billions

Why is this surprising (or a secret)? It probably runs entirely bug-free and has done so for a decade or three - it would be hard to imagine still running if it regularly had issues or sent just a small percentage of those billions of dollars to the wrong place. What does your modernization do better?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: