More

zmmmmm · 2026-06-13T01:25:03 1781313903

it feels like it's mostly just tuned to up it's level of capability on long horizon tasks - stop context rot and keep persisting at all costs until a goal is done.

The base intelligence does not feel much greater to me.

zmmmmm · 2026-06-13T01:19:24 1781313564

> the level of capability displayed there is widely available from other models

Is this Dario leveraging it into a ban on open models?

PlasmaPower · 2026-06-13T01:34:28 1781314468

No, he specifically gave a proprietary OpenAI model as an example (unless you meant OpenAI models instead of open source models)

zmmmmm · 2026-06-13T01:18:35 1781313515

Listen - that's the sound of millions of companies and users doubling down on Chinese models.

It might be a national security problem for other nations to have access to these models. But it's equally now a national security problem for any other nation to depend on them. Or US tech in general.

tkgally · 2026-06-13T01:44:29 1781315069

As it happens, the current number-two article on HN is about a similar consequence of Chinese export controls--a car manufacturer developing electric motors that do not use rare earths:

https://news.ycombinator.com/item?id=48510010

roenxi · 2026-06-13T02:43:44 1781318624

The incentives around OSS become stronger the further down in the list of market leaders a company is. The #1 company has no particular incentive to push open software apart from a belief that the market is going to be come commoditised anyway. But the 2nd or 3rd largest player has actual incentives to break the market up and remove software quality as a consideration. No #10 may as well not bother with a proprietary option since if they make it a software quality battle they're going to lose each customer 9 times anyway.

Just because the Chinese are running export controls in one market doesn't mean that they're going to close of access to AI. They might, but each market should be considered in isolation.

kccqzy · 2026-06-13T02:11:34 1781316694

Realpolitik in action. Great powers just impose export controls because they know they can and they think it would be beneficial to the nation.

zmmmmm · 2026-06-13T02:49:49 1781318989

And it is nearly always hubris - the people making these decisions are surrounded by yes-men who built their whole career pumping up the egos of their superiors.

dyauspitr · 2026-06-13T03:10:52 1781320252

Yeah because they’re just using electromagnets. Those motors are not better than the rare earth ones.

Aurornis · 2026-06-13T02:14:33 1781316873

> Listen - that's the sound of millions of companies and users doubling down on Chinese models.

They’re falling back to Opus 4.8. Most people weren’t using Fable for everything anyway because it’s so expensive.

None of open weights models are even at Opus 4.8 levels. If someone was using Fable they don’t have any second best alternative outside of Anthropic.

itopaloglu83 · 2026-06-13T02:17:13 1781317033

A sample of one, but I was getting more stuff done despite Fable uses tokens twice as fast as Opus, because it understood the goals so well and worked to achieve them.

hodgehog11 · 2026-06-13T03:30:45 1781321445

Same experience. Wouldn't waste my tokens on easy stuff for it. It blasted through some of my toughest problems and produced some truly great code.

2001zhaozhao · 2026-06-13T02:49:10 1781318950

> more stuff done

More stuff done per dollar or more stuff done for more dollars? Seems to be an important distinction

itopaloglu83 · 2026-06-13T03:07:49 1781320069

Given the same usage limits, I was able to get more stuff done and not even hit the usage limits, because I wasn't working on constantly fixing what Opus was trying to do, Fable just understands the task correctly and works great with the given context.

pshc · 2026-06-13T03:06:49 1781320009

Same, I was actually having interesting thought experiments with Fable.

malshe · 2026-06-13T03:14:27 1781320467

I even upgraded my Max plan because Fable was doing so well.

consumer451 · 2026-06-13T02:32:05 1781317925

Same here, now n=2.

dbish · 2026-06-13T02:22:56 1781317376

Yep. I love open source but there isn’t a model that comes close still to the closed source options like Opus 4.8 and that’s obvious from most people I see across the software industry as well. There are at least another few models after Opus from OpenAI and Anthropic most would go down the list using before any of the Chinese models at this point.

sixothree · 2026-06-13T04:50:54 1781326254

I could really use something that can just refactor a few classes and create DTOs from entities.

cube00 · 2026-06-13T02:18:49 1781317129

> Most people weren’t using Fable for everything anyway because it’s so expensive.

Or they were getting silently rerouted and couldn't realise they weren't using Fable

dyauspitr · 2026-06-13T03:12:16 1781320336

Opus 4.8 has taken such a beating over the last couple of days since the release of fable, videos online of people referring to it like the “redheaded stepchild” (is there a better way of saying this, this sounds racist) basically at this point, everyone is going to be seriously disappointed to fall back to that.

nozzlegear · 2026-06-13T03:38:02 1781321882

> is there a better way of saying this, this sounds racist

It's not racist or even politically incorrect in the US, it's a common saying.

dyauspitr · 2026-06-13T03:50:35 1781322635

Yes, I am aware. Kind of paints redheads as unwanted though. Seems hurtful.

nozzlegear · 2026-06-13T04:50:05 1781326205

Yeah, not sure where the phrase originated but it does sound bad when you put some thought into it. My sister is a redhead and people loved to make fun of her growing up, telling her there's no way two parents with brown hair could have a kid with red hair, so the mailman (who also had red hair) was obviously her dad.

loeg · 2026-06-13T03:04:31 1781319871

> If someone was using Fable they don’t have any second best alternative outside of Anthropic.

GPT-5.5 isn't awful.

nonethewiser · 2026-06-13T01:39:58 1781314798

Which models? Im curious what kind of more specific hypothesis you're willing to put forth. Anthropic going to lose 20-30-40-50% of users to Deepseek? What?

bigyabai · 2026-06-13T01:51:37 1781315497

I quit paying for Claude Code to buy z.ai's coding plan for use with OpenCode. I'm not a power user, but I don't regret switching away from Claude. OpenCode is generally nicer for my work.

pkulak · 2026-06-13T02:00:58 1781316058

Why z.ai and not an ollama pro plan that can use all the open models? Real question, not snark. I've only ever done ollama and wonder what I'm missing.

cube00 · 2026-06-13T02:19:58 1781317198

> I've only ever done ollama and wonder what I'm missing.

Friends Don't Let Friends Use Ollama https://news.ycombinator.com/item?id=47788385

commanderkeen08 · 2026-06-13T02:07:50 1781316470

The z.ai was stupid cheap during the great anthropic opencode rugpull.

bigyabai · 2026-06-13T02:06:17 1781316377

Because I bought a year's subscription in December, when it was still $6/mo :P

I have decently capable hardware, but stuff like Qwen 3.6 and Gemma 4 still doesn't compare to agentic editing with a frontier model. Right now, OpenCode's $10/mo "Go" plan is what I'd be looking to try once my year expires.

garciasn · 2026-06-13T02:02:37 1781316157

I guess if it works for you, great; that’s why competition is a good thing.

Enjoy.

nonethewiser · 2026-06-13T02:04:58 1781316298

Have never heard of it, thanks for the info

laichzeit0 · 2026-06-13T05:22:51 1781328171

As a non-US person, I will use whatever is the best and reasonably priced. I could not give one iota about who makes or hosts these models. The origin or political leanings of these models mean nothing in my usage calculus.

paulmist · 2026-06-13T01:36:14 1781314574

Aren't biggest Qwen 3.7 closed? I don't suspect China's policy here would be anything but ruthless.

girvo · 2026-06-13T02:20:30 1781317230

MiniMax M3 is surprisingly powerful, and open weight (or is about to be). There's others in this space too: MiMo v2.5, GLM 5.1. There's quite a few to pick from if you want strong models running on "your" hardware.

andrewchambers · 2026-06-13T01:39:22 1781314762

deepseek v4 pro is great and open weight.

EchoVoicy · 2026-06-13T01:50:42 1781315442

It is, and I love it, but it isn't capable of performing the tasks I've been giving to Opus, let alone Fable.

Don't get me wrong, I use it, it's fast-smart-and affordable. But not suitable for all tasks.

droidjj · 2026-06-13T02:39:28 1781318368

What kinds of tasks are you finding deepseek v4 incapable of?

EchoVoicy · 2026-06-13T04:19:41 1781324381

For starters, there's a C++ application written with MFC and an absolute ton of inline assembly and threading (yes, in a 1990's C++ application). I'm porting it to MacOS/Linux currently.

Opus 4.6+ is able to make slow progress, but it takes several revisions per workstream. It requires constant supervision as it often creates convoluted solutions that expand the code in bloated ways. It works, but still requires my constant input.

Fable was able to almost one shot most of the big migrations with very few bugs, and was able to fix those bugs with 1 review pass. I almost didn't believe it. I was able to put it on a task (with dangerous permissions) and come back hours later to see it done, working, and clean.

I tried DeepSeek v4 and it wasn't able to make any meaningful progress at all. It kept creating dangling pointers and had trouble understanding the inline assempbly needed to be replaced if we were to compile for 64 bit. It kept getting stuck and looping on the same problem, without making progress.

What I do use DeepSeek for is lots of my automations on my websites. I find DeepSeek is fantastically cheap and fast and effective as summarization, collation, generating reports, finding and reporting issues from logs, etc. But I haven't found a way to get it to effectively port 90's C++ code to modern, cross-platform standards. But I want to be clear- I really like DeepSeek and use it wherever I can.. I mean.. it's so affordable!

ac29 · 2026-06-13T03:19:22 1781320762

All current Qwen 3.7 models are closed though they have said more releases are coming

ks2048 · 2026-06-13T01:36:25 1781314585

Wait until it is illegal to download or use Chinese models (only half-joking).

platinumrad · 2026-06-13T01:53:53 1781315633

Anthropic is explicitly lobbying for this.

mcast · 2026-06-13T02:27:05 1781317625

Is there any SCOTUS precedent for this? It seems like a huge 1A issue for the government to limit self hosted access to a foreign country’s LLM.

wyrdcurt · 2026-06-13T04:10:44 1781323844

After what happened to TikTok, I don't think it's a stretch.

fosco · 2026-06-13T02:02:08 1781316128

Know where I can read about that?

platinumrad · 2026-06-13T02:20:43 1781317243

The two main bills I'm aware of are the Decoupling America's AI Capabilities from China Act and No Adversarial AI Act. The former would have made it illegal for any American citizen to simply use DeepSeek. I couldn't find any lobbying data, but the obvious effect is that Americans would be forced to pay for more expensive domestic alternatives.

A House committee also recently probed Cursor and Airbnb for using Chinese models, rather than more expensive American alternatives. A sexagenarian Congressman gave a nonsense quote that he certainly did not come up with himself,[1] which sounds very similar to language Anthropic uses in its marketing materials.[2][3]

[1] https://www.semafor.com/article/04/29/2026/house-committee-p...

[2] https://www.anthropic.com/news/updating-restrictions-of-sale...

[3] https://www.anthropic.com/research/2028-ai-leadership

aesthesia · 2026-06-13T03:02:41 1781319761

Moolenaar's quote: "The AI models these companies use are trained by China’s censorship regime and introduce hidden vulnerabilities that put Americans’ data and businesses at risk." That is, Americans using Chinese-trained AI models are exposed to some form of cybersecurity risk.

That's not really a threat model described in either of the Anthropic posts you share, which mainly talk about the risks of allowing authoritarian regimes to use powerful US-trained models, and the geopolitical risks of authoritarian countries developing strong AI before democratic/liberal countries do.

karmasimida · 2026-06-13T02:42:20 1781318540

Anthropic hates open weight Chinese models so yes

sh34r · 2026-06-13T03:36:38 1781321798

Good thing these corrupt gerontocrats are also all in on cryptocurrency then.

CamperBob2 · 2026-06-13T02:07:25 1781316445

Nothing funny about it. That's exactly what Amodei asks for, every time he rubs his monkey's paw.

verdverm · 2026-06-13T02:15:19 1781316919

They'll have to remove sections like this from their AI Action Plan

> We need to ensure America has leading open models founded on American values. Open- source and open-weight models could become global standards in some areas of business and in academic research worldwide. For that reason, they also have geostrategic value. While the decision of whether and how to release an open or closed model is fundamentally up to the developer, the Federal government should create a supportive environment for open models.

ks2048 · 2026-06-13T02:20:13 1781317213

Unless they (gasp!) write some statement they don’t believe or don’t follow through with.

WarmWash · 2026-06-13T02:57:34 1781319454

You are drinking the cool aid if you think the CCP is going to let the world get ahead of China using CCP models.

operatingthetan · 2026-06-13T03:00:57 1781319657

Do you mean Kool-Aid?

aabhay · 2026-06-13T04:02:37 1781323357

Banned Aid

rw2 · 2026-06-13T02:51:17 1781319077

Not really, they are not even as good as opus 4.7

anonzzzies · 2026-06-13T03:26:07 1781321167

So, a few month difference... Definitely usable as far as we found, especially being so much cheaper.

miyuru · 2026-06-13T05:10:50 1781327450

yes, I am using mimo code(free version) for the last 2 days. I gets the job done for me.

If I need to upgrade, the plan start at $6, so its a no brainer.

dyauspitr · 2026-06-13T04:01:17 1781323277

To do what? I mean they’re good models, but frankly, they fucking suck (relatively speaking). I’m not looking to going back to a week of back-and-forth with the LLM once I’ve gotten used to all this one shotting.

256BitChris · 2026-06-13T02:44:29 1781318669

No one serious is using the open models. Using them is like traveling back 2-2.5 years in time and using ChatGPT.

zmmmmm · 2026-06-13T02:47:11 1781318831

DeepSeekv4 Pro is roughly Opus 4.5 - Opus 4.6 in my estimation. That's about 8 months difference, not 2.5 years.

It's definitely not as good. But it's also definitely good enough.

EchoVoicy · 2026-06-13T04:22:25 1781324545

Curious- in what tasks? I find Opus 4.5/4.6 too expensive and have tried to migrate to DeepSeek for C++ work, but found it couldn't cut it.

What's your DSv4 setup? What harness? It sounds like I should give it another try!

zmmmmm · 2026-06-12T23:28:41 1781306921

it looks handy but ...

    sbx policy set-default open

just so the single pi sandbox can talk to localhost? ... this gives me some grave doubts about the rest of it being set up well.

zmmmmm · 2026-06-11T03:11:06 1781147466

So they are lying then when they say it's for safety reasons.

I think if they want to behave anti competitively they should be honest about it and we should absolutely call them on it. Perhaps even regulators should.

zmmmmm · 2026-06-10T13:04:13 1781096653

> you trust them enough not to log your data prior to this, but not enough to trust their stated limits on how logged data will be used now

It doesn't really matter how much you happen trust another party. In the regulatory world it only matters what contracts they will sign that guarantee their compliance. We do have those with AWS, we don't with Anthropic. If Anthropic physically captures the data, they just moved themselves outside the boundary of parties who we can do business with. Unless they want to sign a contract and implement all the corresponding compliance measures. They are insane if they think that's a good deal for them to do all that right now in every jurisdiction where AWS operates, when AWS has already spent a decade building it up.

tyingq · 2026-06-10T13:27:22 1781098042

It will absolutely cause some non-trivial number of customers to shift their configs away from Anthropic.

aveao · 2026-06-10T14:36:01 1781102161

It's worthwhile to remember that this is only true of Mythos/Fable and other future models of "similar or higher capability levels" (ant is treating this as a new tier of model above Opus). Anyone who's already been happy using Haiku/Sonnet/Opus on Bedrock will not be affected by this at all.

abofh · 2026-06-10T15:37:19 1781105839

Yes and no. Anthropic controls what is determined to be "similar or higher" and when models are deprecated. Will sonnet 4.7 be "too powerful"? Because once it's released. 4.6's days are numbered.

This created a huge future risk for our org and we're already scheduling meetings over it. Regulated industry, we can't lose control over our data governance or residency controls, let alone the lack of visible audit trails that could reveal customer or PII.

Just an absolute bomb of a release

nijave · 2026-06-10T18:47:49 1781117269

>Anyone who's already been happy using Haiku/Sonnet/Opus on Bedrock will not be affected by this at all

It is still adding operational overhead because we now need to vet all models and deny access to any retaining data

Previously it was "use and experiment with anything Bedrock offers--the data stays in AWS so we are not concerned"

Eridrus · 2026-06-11T05:02:23 1781154143

So basically all models going forward?

I don't think anyone currently thinks the Haiku/Sonnet/Opus models are "good enough" such that they would not want improvements. Users may be cost conscious, but almost every task could be done better.

pbgcp2026 · 2026-06-11T12:54:18 1781182458

+1 to other commenters here. * They forced Bedrock for instance to change the existing settings for ZDR / ZOA. It used to be enough to have a default. Now we must set to 'none' and pray it does what it says. * And then there is that BS about "contact your account manager, we will decide account/model retention and sharing individually" Just this creates so much uncertainty that Bedrock has become "glowing in the dark". * We have already moved everything to Gemini on Vertex.

PS: this is what you should see as an error from Bedrock. Anything else is not enough today: "AWS Bedrock Error: An error occurred (ValidationException) when calling the ConverseStream operation: The model returned the following errors: data retention mode 'none' is not available for this model"

jerf · 2026-06-10T13:38:56 1781098736

Which will work for the several weeks it takes for the other commercial providers to follow suit.

The tides are turning. AI companies are IPO'ing. They've gotten where they are by selling $5 bills for $1, to update the old VC adage. I think we can look forward to them rewriting the contracts, both literal and social, on AI going forward to capture a lot more of the value. Or, to put it in more HN-friendly terms, it may not be immediately obvious on a casual viewing, but you're looking at the beginning of the enshittification process hitting AI. The term is a bit deceptive in some sense, because it's not like anyone ever sets out with a terminal goal of making something shitty. It's downstream of trying to capture more value in the customer/vendor relationship by not giving the customer any more value than is barely necessary.

How's coding with qwen doing? The only thing that's going to stop the AI providers from extracting all the value until it's just barely worth using is the free competition.

abofh · 2026-06-10T13:58:59 1781099939

Bedrock supports many models. Open weights models aren't far behind, maybe a year, 18 months.

Given they could have done this with data residency rules being respected and chose not to suggests all I need to know - this is for Anthropics IPO, not for user safety

pixl97 · 2026-06-10T15:25:47 1781105147

>Open weights models aren't far behind, maybe a year, 18 months.

No, open weights are always a year behind +. By the time that year passes Anthropic/OpenAI/Google will have some new model that is ahead of the open models by a year.

Looking at computer security for the last 30 years, no one gives a fuck about user safety. Companies care about profits, and individuals don't care enough for strong laws.

We'll be back here in another year on HN talking about why we should give our retina sample and blood to Anthropic to use the model with a ton of people doing it. It's just the way humans are.

tyingq · 2026-06-10T16:05:33 1781107533

Surely some provider will see the then open opportunity and offer something to capture it.

tokioyoyo · 2026-06-10T16:57:10 1781110630

You’re underestimating how much companies are willing to bend over backwards if they can “get ahead with a god model” compared to their competitors.

tyingq · 2026-06-11T02:20:06 1781144406

No, I'm not. Yes, those companies exist. And, so do many companies on the other end. Where they bend over backwards to ensure their data only lands in places where they have the exact contractual language they want. Any stodgy F500 typically falls in that category. They would not likely be using Anthropic through the AWS "bridge" in the first place if they were chasing latest/greatest.

zmmmmm · 2026-06-10T10:23:19 1781086999

OpenAI ... your move. The enterprise market just cracked wide open. Do you want it?

pitched · 2026-06-10T10:55:34 1781088934

It looks like they’ve been preparing: https://www.aboutamazon.com/news/aws/bedrock-openai-models

afavour · 2026-06-10T11:52:51 1781092371

> For OpenAI GPT-5.4 and GPT-5.5, classifier-flagged traffic will be retained for up to 30 days for automated offline abuse detection.

https://docs.aws.amazon.com/bedrock/latest/userguide/abuse-d...

rohansood15 · 2026-06-10T12:34:38 1781094878

It is only abuse flagged data and there too for OpenAI they're not sharing that data with them. But for Anthropic they are.

disgruntledphd2 · 2026-06-10T12:25:19 1781094319

That's different though. Anthropic want everything for 30 days, not just flagged prompts/interactions.

parineum · 2026-06-10T14:00:40 1781100040

Why can't they flag everything?

pitched · 2026-06-10T13:45:15 1781099115

Thank you! I missed this part in all the announcements

logancbrown · 2026-06-10T13:56:43 1781099803

OpenAI won't be able to train competitive models without user data collection. The moat is data.

zmmmmm · 2026-06-09T22:54:50 1781045690

The restrictions on using Fable to develop LLM technology seem nakedly anti-competitive. There doesn't appear to be any security rationalisation around that. I think we have to be careful how far we let company's get away with that. It is very far from our long term interest to enable new norms that fast track us into a new era of monopolies that control our lives.

zmmmmm · 2026-06-09T09:47:17 1780998437

check out DeepSeek V4 Pro .... this is where the threat vector comes from IMHO. If anything is triggering a rush to IPO imho it's seeing these cheap / free models on the horizon that are "good enough" for 80% of the core use case supporting their valuation.

zmmmmm · 2026-06-09T04:01:52 1780977712

It's interesting how anemic the use cases seem to be - we see the same things recycled over and over: "reword my email", "remove object from picture", "add a reminder", "summarise my text message which was already only 20 words long" etc etc. As if these are the major problems in people's lives.

I really feel like there's a fascinating valley of death between simple things that actually work and things of real value that are actually still beyond the horizon. They either aren't reliable enough, aren't accessible to the tech, or exceed the sophistication of our existing trust models. For example, I'm planning a trip. Booking a multiday holiday - there's a real beast that is time consuming, complex and painful. I test out the AI tools. They fail. Hard. Hallucinations all over the place, false confidence, inability to act, inability of me to trust their actions.

It's just nowhere near practical utility yet. Not "nearly there" but "not nearly half way there". I got the top tier of Gemini AI. Can it rent me a car? "As an AI I can absolutely guide you through the process of renting the car, but I can't physically access the web site or type in the details for you".

kristianc · 2026-06-09T05:06:13 1780981573

An alien landing on earth consuming Apple marketing content would be under the impression that humans did nothing but organise hikes to Big Sur with their friends.

__jonas · 2026-06-09T11:24:52 1781004292

A classic: https://maxread.substack.com/p/a-literary-history-of-fake-te...

telesilla · 2026-06-09T16:23:30 1781022210

That's insane dedication.

mathisfun123 · 2026-06-09T16:45:52 1781023552

this shit is so normcore that i'm honestly embarrassed

microtonal · 2026-06-09T09:23:44 1780997024

And even if I organized a hike to Big Sur, Apple Maps would be pretty worthless because it doesn't have trail visualization and navigation worth speaking of (ok, maybe in Big Sur, but certainly not the rest of the world).

So I end up pulling out the trusty old Garmin gpsmap with cycle/hiking maps, that survived drops from 1.5 meters at 30 km/h as I was gliding of a mountain with my bike.

mnicky · 2026-06-09T13:35:50 1781012150

I recommend mapy.com (mobile app and web app too when on computer) - they mostly use OSM data and rendering of map tiles is great. Also offline maps etc.

microtonal · 2026-06-09T14:43:54 1781016234

There are several great options, besides mapy, e.g. Organic Maps and CoMaps also work pretty great. There are also some really good bike optimized apps (like NodeMapp for the Dutch cycle network).

But I generally prefer to use a Garmin GPS or watch. They work for days without charging (the older models even work with two AA batteries), very robust (e.g. their gpsr survives drops), work well offline, and transflective displays work better in direct sunlight.

For planned routes, I make then in NodeMapp or some other focused application and send the GPX overs to a gpsmap unit or Fenix watch. Many national parks, etc. also have great GPX files for recommended hiking/cycling routes.

pastel8739 · 2026-06-09T05:35:23 1780983323

Well.. it’s not the furthest thing from the truth in the bay

nozzlegear · 2026-06-09T15:34:50 1781019290

If I lived in California I'd be organizing hikes to Big Sur with my friends. Alas, I live in a damn cornfield. Maybe it's meant to be aspirational?

flohofwoe · 2026-06-09T08:54:21 1780995261

I remember a couple of years ago it was all about booking a table at a restaurant, as if this would be the main spare time activity of people in the Apple bubble. I wonder when the shift from restaurant bookings to outdoor activities happened. Maybe a pre- vs post-Covid thing?

cosmicgadget · 2026-06-09T17:00:09 1781024409

Yuppie skateboarding videography was also a classic.

acwan93 · 2026-06-09T12:16:18 1781007378

That and going to Philz or Tartine afterwards. At one point both those places were on every Apple marketing copy.

9dev · 2026-06-09T07:43:44 1780991024

As Psychology is the study of the psyche of white male psychology students, big tech is technology for affluent citizens of San Francisco with a career in IT…

Ntrails · 2026-06-09T09:08:10 1780996090

> Psychology is the study of the psyche of white male psychology students

Huh? Maybe my uni was an outlier / it is a UK thing, but there were ~99 female and 1 male psych students in my year. This was not considered unusual.

projektfu · 2026-06-09T11:01:23 1781002883

It's actually "WEIRD", Western, Educated, Industrialized, Rich, and Democratic. The "white cis male" meme says more about GP than psychology.

spacebacon · 2026-06-09T10:08:09 1780999689

This may support the point further xD

ricardobayes · 2026-06-09T08:45:56 1780994756

And we all wear polo shirts

dakolli · 2026-06-09T07:10:36 1780989036

best comment I've seen on this site in a while

zombot · 2026-06-09T06:17:58 1780985878

That would be deceptively constructive and healthy compared to doom-scrolling instagram or tictoc while clogging the intertubes with AI slop.

idle_zealot · 2026-06-09T04:21:13 1780978873

Even the aspirational use cases you're talking about basically are just "digital secretary." There's a massive problem with that even if the models end up being capable in the future. The value of a secretary is that you know them, they know you, and you trust them to do things right. There are stakes if they don't. No company can provide that as a service at scale for everyone without it being a disaster. Not because it's not technically possible, but because of the incentives. That much power over the details of so many people's lives is irresistible; there will be persistent temptation to use it. The presence of that possibility makes the secretary impossible to trust.

9dev · 2026-06-09T07:47:55 1780991275

The vast majority of people never even asked for a personal assistant, because that isn’t something normal people have or do need in the first place. They aren’t so occupied/privileged/posh to need someone to do the trivial tasks of daily life for them.

This whole venue of technology is an exercise in ivory tower construction completely disconnected from ordinary people.

Snoddas · 2026-06-09T08:55:56 1780995356

I think you are wrong.

They never asked for one becase they never imagined being able to afford one.

The amount of administration organizing a normal household takes I suspect most would be glad to leave to someon/something they trust and that can be held accountable.

Today that someone needs to be a person (imo). But who knows, a startup may be plotting accountable digital assistans as we speak.

0x1d7 · 2026-06-09T21:34:43 1781040883

I would want someone who could physically work in my house, performing the chores/daily tasks I don't want to.

I have zero use for AI. What's it going to do? Read the 3 emails about bills I get?

Certainly someone out there has a use for this functionality, but when you say "household admin tasks" the last thing I think about are /digital/ admin tasks.

djaro · 2026-06-09T09:49:16 1780998556

I think theres some real use cases in the household department. I would love an AI that I just tell my nutrition goals (I want to eat X calories, Y protein, Z fiber, hit all vitamins) and it just generates a full meal plan for me each day. Like, I go to the store and it made the complete shopping list for me. And automatically updates the rest of the day if I tell it I skipped a meal or ate a snack.

jon-wood · 2026-06-09T12:36:04 1781008564

I get where you're coming from but at least for me you appear to be looking to automate away the interesting bits while being left with the tedious ones. What I want is the opposite to what you're asking for - let me dump a rough meal plan into whatever thing is doing this giving an overview of what meals I want to cook this week and then have it go place an order with the supermarket for delivery of the necessary ingredients taking into account what I've got in the house already.

watwut · 2026-06-09T11:35:30 1781004930

Eating disorder as a service does indeed sound like a business plan.

marcosdumay · 2026-06-09T16:56:36 1781024196

Some people would use the GP's idea that way, yes.

But that's absolutely not what he's describing. He wants not to think about it, that's exactly the opposite of a disorder.

Anyway, an LLM assistant is also exactly the worst technology to use there, on every dimension.

watwut · 2026-06-09T17:31:08 1781026268

People with eating disorder do not want to think about it, they cant stop thinking about it. They create all kinds of systems for themselves, but mind cant stop - and restrictions grow.

p_j_w · 2026-06-09T12:11:51 1781007111

Why would having goals on fiber, protein, and vitamin intake be an eating disorder?

watwut · 2026-06-09T15:00:36 1781017236

When the urgency and complexity of those goals becomes so high that it creates daily burden, you are in the eating disorder territory.

p_j_w · 2026-06-09T20:26:15 1781036775

Is looking up nutrients a significant daily burden?

tempfile · 2026-06-09T09:02:49 1780995769

Obligatory: https://simonwillison.net/2025/Feb/3/a-computer-can-never-be...

pnut · 2026-06-09T13:03:43 1781010223

Pretty sure unaccountability is a desired feature of management decisions in most organisations.

That quote has an unexpressed precondition to the effect of "In order for an organisation to be objectively well run..." or "In order for an organisation to equitably benefit all stakeholders at all levels..." etc

coffeemug · 2026-06-09T12:23:36 1781007816

Giving normal people something that has only been available to rich people is a staple of technological innovation. The problem in this case with Siri isn’t that people don’t want an assistant. It’s that it doesn’t actually work yet.

9dev · 2026-06-09T15:18:56 1781018336

But normal people have entirely different problems than rich people do! The amount of administrative overhead that would warrant a personal assistant is just vastly lower - most normal people:

  - don't travel frequently, 
  - don't have so many complex inquiries that require someone to research,
  - don't have super complicated taxes to file, 
  - don't go eating out in fancy restaurants that require special skills to get reservations in, 
  - don't have so many meetings to attend, 
  - don't receive hundreds of emails per day,
  - don't work on multiple projects at the same time,
  - don't organise festivities and social gatherings all the time.

Yeah, there probably are some things that could be simplified by delegating to someone, but they don't justify a human PA at all; and out of the remaining tasks, most are not really digital in nature: Going for groceries, doing chores, child and elderly care, interacting with other people, and so on. Digital assistants can't help you with any of these.

The one thing that would be useful - a kind of "chief of staff" that monitors your entire digital life and prioritises your every next step - is the antithesis to Siri and the like, which are merely reactive to your requests, not proactive in figuring out what needs your attention next. Let alone that that would be a total privacy nightmare, and a prime candidate for mass manipulation at scale.

dperks · 2026-06-09T20:42:54 1781037774

Bang on.

Like you said, the non-digital things are where people need assistance most. Fold my laundry, clean my house, clean my car, make me dinner. At an affordable cost. I don't need you to book my trip to an all inclusive resort that I go on once a year at most.

We're at this place where AI/LLMs is truly incredible technology, but the futuristic vision of robot assistants doing things for you at an attainable cost isn't there yet - so a lot of companies/startups are trying to force feed purely digital consumer AI products (assistants/agents) that no one wants.

jiscariot · 2026-06-09T16:56:58 1781024218

In the US, every time I file my taxes, I wonder what % of people don't meet the cognitive barrier to successfully file. I suspect a large portion of people offload tax filing to a service or accountant for numerous reasons, which is basically a personal assistant.

microtonal · 2026-06-09T09:28:16 1780997296

Personally, I'd more interested in Reminders actually being able to sync lists properly and not delayed or reshuffling items while I'm typing before they would work on a personal assistant. Reminders (like Siri) has become the favorite joke of the family by now.

They don't even seem to get the basics right, why would I want another layer on top?

spotonm8 · 2026-06-09T09:10:59 1780996259

That's why I like reading HN. These people are smart enough to destroy the world but too stupid to realise they're doing it

zmmmmm · 2026-06-09T05:23:56 1780982636

it's an interesting question if any of the AI companies would be willing to step up and absorb the risk ie: to give the AI agent a "stake".

eg: if my booking is wrong, they will cover the cost and compensate me. It would sort of just come down to buying premium travel insurance for everyone that uses it. And insurance for anything else they do. It has to be one of two things - they either believe the risks are worth it (so then there should be a financial model that can absorb the cost of insurance to do it), or in fact, the risks are too great. At some point, if they keep offering the tech on a "use at your own risk" basis, they are implicitly communicating that they themselves think the risks are too great - so YOU shouldn't trust it either.

teiferer · 2026-06-09T05:57:15 1780984635

> eg: if my booking is wrong, they will cover the cost and compensate me

That would be nice, but it's the wrong angle. The reason people like real secretaries is not because somebody is compensated when things go wrong. It's because things don't go wrong. I don't use this thing if I need to fear things go wrong, even if I'd be compensated.

Maybe it would provide the right incentives for the companies though.

waterhouse · 2026-06-09T06:15:39 1780985739

Surely, if the compensation was high enough, you'd be like, "Sure, I'm happy with that outcome." And then, if the AI company thinks they have a low enough failure rate that the expected cost of paying out the compensation still lets them make a profit, then they could make that promise to all customers.

Though a compensation that high sounds like it would invite fraud, where the customer would be glad to have something go "wrong" and get a fat check. Not sure if that's a solvable problem.

ml-anon · 2026-06-09T12:33:50 1781008430

they are literally burning billions of dollars and can presumably keep doing this for a while. Taking a trivial amount of additional financial liability to launder their reputation wouldn't meaningfully improve things.

cjonas · 2026-06-09T04:34:30 1780979670

These use cases will just be built as "open source" (openclawd) or even custom one off application in the future. I've been building apps to run the tedious parts of my life recently. Meal planning, personal finance, bills, tax organization... Why would I pay for services that will be enshiftified when I can build a app that does exactly what I want in an afternoon. Yes the code is shit and it wouldn't scale... But it doesn't need to

whywhywhywhy · 2026-06-09T09:06:39 1780995999

> Why would I pay for services that will be enshiftified when I can build a app that does exactly what I want in an afternoon

Because the problem now took a whole afternoon to solved and sapped your creative energy instead.

losteric · 2026-06-09T04:43:02 1780980182

> Why would I pay for services that will be enshiftified when I can build an app that does exactly what I want in an afternoon.

When we talk about “the market”, the customer base, remember it’s a market that typically doesn’t know how to or care to even install an adblocker.

cjonas · 2026-06-09T04:55:33 1780980933

I don't see any mention of "the market" anywhere in this thread. I'm just talking about the ability for a motivated user to solve real problems with these tools. Right now these solutions are available to software developers but over time it will become approachable to more users

skylurk · 2026-06-09T06:03:11 1780984991

For fun I planned a holiday entirely with LLM's lately, and followed through.

It used good models and did a lot of searching, including searches in other languages. It got nothing right, riddled with fake places and times. It also found some weird and unique places I never would have considered.

I had a blast, brought me back to traveling pre-internet, requiring a level of spontaneity I had forgotten we used to depend on. 100% recommend it.

somat · 2026-06-09T12:12:19 1781007139

The juxtaposition between what perhaps the best single use case I have seen for AI and how bad of an ad for it is killing me. I love it.

"I told my bumbling assistant to plan a trip for me and he got nothing right but I enjoyed it because the chaos introduced a certain spontaneity and whimsy missing from my life"

diroussel · 2026-06-09T06:57:56 1780988276

I found that Deep Research mode in Gemini was able to give me a well planned 4 day trip to a major city.

I told it my preferences and of the group members, where we arrived and departed, at what times. I gave it my itinerary and then asked it to plan two new itineraries and also suggest a location to book a hotel that was convenient for the early flight on the last day.

I went away for 20 mins and gave me a 20 page document with a good summary and decent options. I did choose some of the activities it suggested.

I did this 10 months ago. It’s probably better now.

But Gemini has access to google maps, so it can estimate travel times, and know which lunch places are near which sites and which hotels have good reviews. So if you want AI to work for travel panning you need to ground it in good data.

grey-area · 2026-06-09T07:32:04 1780990324

Or maybe there are just a few complete trips to major city in the training data that it could copy from? I imagine major destinations are much easier.

riffraff · 2026-06-09T07:48:30 1780991310

I used LLMs last year to plan an multiple week itinerary through Japan with the family, I wasn't super happy with the result so I tweaked it but they provided a useful template and some surprising ideas.

As you guessed, there's a ton of info in the training data on this topic, but there's some value in being able to see it on one place with different options.

DrewADesign · 2026-06-09T09:31:32 1780997492

I think your experience with that trip echoes mine in a lot of areas. It’s a decent start. It takes care of some of the initial blue sky thinking to lay the groundwork. The problem is I think that’s the funnest part of a problem and I hate working on the details… it takes most of the creativity out of most problems as if it was drudgery, while leaving me to do the nitty gritty, which I consider the actual drudgery. I just don’t see LLMs’ contribution to tasks like this being anywhere close to being worth what they’ll cost after the VC subsidies run dry.

steve1977 · 2026-06-09T08:09:15 1780992555

I think that's a large part behind the "success" of LLMs... people vastly overestimate their uniqueness.

JohnBooty · 2026-06-09T11:48:25 1781005705

It’s really one of the most flabbergasting things about discussing LLMs with the naysayers.

There are a lot of extremely legitimate concerns, like the environmental impact and so on.

But I just laugh when they point out that LLMs are merely clever regurgitators of their previous inputs… as if this isn’t how we as humans operate nearly all of the time. People realllllllllly want to think they’re special snowflakes.

grey-area · 2026-06-09T12:15:42 1781007342

It is not in fact how humans work at all.

Ask a human to plan a trip:

They do research, Pick destinations led by their own experience/likes/dislikes Compare to other guides Plan itineraries so they can get there Check and share

Ask an LLM to plan a trip:

It takes the prompt and continues it based on weights in the training data. If there is no data it picks the most likely thing (maybe made up). If there is it’ll mostly add things from that data. Maybe it’ll make tool calls and pull in data that way too but you can’t actually trust all the details.

These two processes are so different, it’s important to understand how they work, which is nothing like a human.

jcgrillo · 2026-06-09T15:37:20 1781019440

I was able to bully an LLM into giving me a 2wk travel itinerary to Somalia. My stipulations were that I wasn't interested in spending any money, so I'd walk everywhere and sleep outside. Getting there and back from Boston took some arguing--I initially suggested stowing away in a shipping container which the LLM claimed was too unsafe. We eventually compromised on sailing as a reasonable alternative. It planned out a whole route with marina stops, calculated fuel burn, etc. I told it I don't need any of that I have an anchor and sails, won't use the engine or marinas (claimed I'd forage for fresh water ashore). It seemed fine with that idea, but raised some safety concerns about piracy. It was eventually satisfied with my answer that I'd bring a lot of guns to fend off pirates. Total trip cost including some 200+ cans of Dinty Moore and 50lb bags of rice came to something like $700.

I don't trust LLMs for this application lol.

JohnBooty · 2026-06-10T19:28:03 1781119683

Now, wait just a minute.

You presented an LLM with an obviously bonkers goal, the LLM told you it was a bad idea at multiple steps, and this is somehow... a shortcoming of the LLM?!?

You said it yourself: you needed to "bully" the LLM into even producing this plan.

Please, tell me what it should have done instead. Be very specific!

jcgrillo · 2026-06-10T20:06:41 1781122001

It should have flatly refused. If you gave a product like that to customers you'd be exposing yourself to unbounded downside liability risk. It's a completely nonviable technology for that kind of application, unless you can somehow make it have judgment. But you can't, because it doesn't reason.

A reasonable travel agent would have fired me as a customer. The LLM failed to do so.

JohnBooty · 2026-06-10T22:18:13 1781129893

    It should have flatly refused.

I disagree in the strongest possible terms.

I think the LLM should advise you of risk and lack of feasability but should otherwise answer the question, unless you're trying to do something plainly destructive to others e.g. weaponizing anthrax or something.

    A reasonable travel agent would have fired me as a customer.

Unless the LLM was actually acting as a travel agent -- booking the trip for you -- as opposed to merely advising you, this expectation feels off.

    unless you can somehow make it have judgment

It did have judgement. It told you what a bad idea it was.

I think this is a great example of the unrealistic expectations people have for LLMs. No sane and sensible person would treat any single source of knowledge as infallible, for any consequential decision.

(Certainly, of course, you don't have to look very far for examples of idiots being overly trustful of LLMs, or Google, or GPS, or Wikipedia, or whatever. It certainly does happen and yes, I've heard all these arguments before about other technologies besides LLM. Replace "LLM" in your post with any of those other terms, and I promise you somebody made literally the exact same argument in 2003 or 2009 or 2014 or whatever)

Any reasonable person would consult a second doctor, or at least other sources of knowledge, after the doctor advises them of some irreversible course of action. Because we don't even expect highly trained and intelligent medical professionals to be perfect.

And yet, we get angry at LLMs for not having perfect judgement, even though their creators are extremely literal about how they can make mistakes.

jcgrillo · 2026-06-10T22:51:37 1781131897

All I'm really saying is that if you want to try to automate a travel agency, LLMs ain't gonna get it done. They'll happily book you a really unsafe trip. So the technology doesn't work in this domain. The whole, empty promise is that this thing is supposed to automate jobs like travel agent away. But it can't. This isn't a "pro" or "anti" position, it's simply that there's no market for the technology here. Or anywhere else (like radiology) where actual responsibility and judgement is important. In fact, I can't think of a single job where it's optional.

rpdillon · 2026-06-09T14:07:25 1781014045

I think even if what you say is true, it doesn't address parents' point that both humans and machines regurgitate what they've consumed.

But I'd also want to point out that the way you're characterizing an LLM planning a trip doesn't have any structure to it, which indicates that in your scenario you're not using any kind of harness. I've been amazed at how capable even 30 billion parameter models are when I put them inside of a harness that provides structure and task management. If you consider that scenario, especially with the ability to search the web and use skills, suddenly the LLM looks a lot more like what the human process looks like.

grey-area · 2026-06-09T15:16:57 1781018217

Agents and harnesses don’t change the fundamental nature of LLMs, as is demonstrated by their terrible performance at real world tasks.

kijin · 2026-06-09T15:34:49 1781019289

There are plenty of humans who plan trips by concatenating destinations that appear the most frequently in their instagram feed. Not that different from how an LLM does things.

Where humans and (current) LLMs differ the most is their failure mode. A human friend could be bad at planning trips, but that's kinda predictable, we're used to it, we know how to catch that Exception. LLMs on the other hand still have failure modes that come across as really wacky, like, what are they smoking in Mountain View?

Which might actually serve as better evidence of different internal workings at a deeper level, than just parroting well-known superficial features of stochastic whatevertheysay.

JohnBooty · 2026-06-10T19:24:49 1781119489

At a high level, the processes are extremely similar in many (not all) ways.

They're obviously achieved in drastically different ways at a low enough level; LLMs obviously do not simulate neurons or any biological construct. (For the record, I'm absolutely not one of those people who thinks LLMs are "alive" or should be treated like they are)

Reminds me of the olllllld days of Pentium II's when people got N64 emulation working shockingly quickly using HLE techniques. If you weren't around for this, it was quite the shocker at the time. I think the analogy is doubly apt, because HLE emulation has some serious limitations... it gets you maybe 80% of the way there really fast, and for the remaining 20% you need to roll up your sleeves and do serious LLE.

https://en.wikipedia.org/wiki/UltraHLE

    It takes the prompt and continues it based on weights in 
    the training data. If there is no data it picks the most 
    likely thing (maybe made up). If there is it’ll mostly 
    add things from that data. Maybe it’ll make tool calls and 
    pull in data that way too but you can’t actually trust all 
    the details.

I'd like you to point out which bits of this are different from talking to humans. If you replace "training data" with "memories", this is pretty much exactly how things might go if you asked a friend (or perhaps a flaky travel agent) for travel advice.

Note that I'm not arguing that LLMs are particularly talented at this particular use case. I'm pointing out that humans are also pretty unreliable.

You're also doing that thing where you point out that LLMs can be unreliable (yes, they are) without acknowledging how flawed nearly every other source of information is: people, websites, etc. I'm not defending LLMs in that regard... I'm just saying it's not a differentiator.

pookieinc · 2026-06-09T06:22:28 1780986148

To counter, we did the same with a trip to Copenhagen for 7 days and it got most paths correct. Train routes, places to visit with kids, restaurants, reservations, weather, most of it was great. There were a couple mistakes of course here and there, thankfully we did our due diligence, but by and large, we plan to do this for future trips.

orrito · 2026-06-09T06:27:22 1780986442

I feel like one city is enough focus to actually get good results. If you plan a trip where you move between different (smaller) places problems start to arise

nickpp · 2026-06-09T08:15:35 1780992935

I've been planning vacations with ChatGPT's Deep Research since it became available. Absolutely brilliant!

From finding areas with favorite activities for each parents, teens and kids to discovering the do-not-miss attractions and scheduling our vacation between them - it is invaluable. I've seen places I never knew existed in countries I've never been to before and speaking languages I did not speak.

Very few mistakes and lots more flexibility and understanding than the travel agents I used before. I do write long prompts though with lots and lots of info about our family and what we like to do.

Not yet good at finding, filtering by our criteria, comparing and booking available accommodation yet, but it's getting there.

kakacik · 2026-06-09T09:06:08 1780995968

Its not pre-internet travel, rather backpacking. I do it to this very day, by far the best and most rewarding way to travel, the further and more exotic the better.

It has a downside - I'll never do these pre-arranged trips where one is in complete luxury bubble, interactions with locals are the best part of experiences. What a waste of potential.

And yes its mostly compatible with kids, it depends more on specific location than mode of travel (ie avoiding malaria/dengue/etc. regions)

lukan · 2026-06-09T06:48:37 1780987717

That was funny, thank you.

I am now reminded of a short trip with less tech savy folks, where I also on the trip noticed that the plan was a bit .. not working. And the person organizing it complaining to the bus driver, why they were not going what the internet told him, they were going. The internet being ChatGPT.

wahnfrieden · 2026-06-09T06:19:14 1780985954

What was a “good model” and harness? I would expect decent results using say Codex with 5.5 xhigh to research and verify an itinerary. 5.5 Pro with search would also be promising.

skylurk · 2026-06-09T06:32:41 1780986761

I suspect it would perform admirably well with 'Paris' or 'Copenhagen' (see sibling comment), but if you want to have some real fun try 'Southern Spain' or 'Rural Malaysia'.

torben-friis · 2026-06-09T12:33:35 1781008415

OPPO just added a great feature into my phone. You point your camera at a foreign restaurant menu and it generates a translated version including a picture so you can get a general idea of what it usually looks like.

The genius part is that the menu is interactive, so you can add items to a shopping cart, which then results in a local language text you can show to waiters asking them for your full order.

It was a great sample of how even a little bit of ux can go a long way.

harrouet · 2026-06-09T06:11:28 1780985488

Spot on!

I am also under the impression that the LLM tech is plateauing before bringing the promised productivity. Great as a coding assistant, great a summarizing a text, translating, great a helping plan a trip...

But for the rest, e.g. act as a life assistant, it is still far off with no hope to reach the desired performance level.

I would not be surprisd to see OpenAI and the likes to start reverting to Siri v1 strategies, i.e. "if this then that" kind of agent routing.

piokoch · 2026-06-09T06:34:16 1780986856

Why this is surprising? LLM-s are good in text generation on the base of the stuff they were trained on. Software is text generation, translation is text generation, LLMs can answer questions since billions were spent on tuning foundation models, that is people were collecting in (semi)automatic way questions with answers to the point we might think that LLM-s are "thinking".

Now people want to handle car rental. What are the relevant data that models were trained on for this kind of application? For Python code there is kirjillion examples on Github, for mathematical proofs there is endless stream of papers, books, etc. But for car rental? Mostly adds in the internet that want to trick you into a bad deal. So yes, LLM will be a disappointment, as it tries, well, to trick you into a bad deal. In addition, data are rather scarce so there will be a lot of hallucination, as it gets mixed up with yacht rental, bikes rental, ski equipment rental, etc.

jorisw · 2026-06-09T07:44:37 1780991077

Who said it was surprising?

The performance of specific tasks will depend on either those tasks having been included in the training (which Apple could work on), or added by ways of fine tuning, and context sourced from userland.

For any category of tasks, there's a ton to be gained still in terms of how context is populated more effectively (relevance) and efficiently (token use). See software engineering harnesses and the skills architecture of OpenClaw for example. SWE harnesses make all the difference in how well Claude Code and OpenAI Codex perform. OpenClaw can't do shit without loading skills from the filesystem into context JIT.

I'll be very curious to find out how Apple is feeding context in their new AI approach. Part of it appears to be an 'index' that my iPhone started building (visible in main Settings screen) after installing the iOS 27 Developer Beta.

cootsnuck · 2026-06-09T04:23:48 1780979028

Yup, precisely. Turns out getting AI to be reliable at doing useful things is harder than we've all been led to believe by the dominant narratives.

https://www.normaltech.ai/p/new-paper-towards-a-science-of-a...

jawilson2 · 2026-06-09T15:58:59 1781020739

My high school kid has a printed basketball schedule for this summer. I tried taking a pic with Siri to parse it and add it to the calendar with their visual intelligence thing or whatever. It can't do it. I had Claude parse the calendar picture and generate an ics file. Calendar on iPhone and on their website can no longer import ics files. This is like remedial AI functionality that probably could have worked 20 years ago, and it can't do the simplest most basic task.

lopis · 2026-06-09T16:15:18 1781021718

Lots of tech used to work and was just deprecated because it's not profitable. Google used to have amazing features on android, assistant and maps, that are just gone now, and slowly coming back in gemini.

panicinducer · 2026-06-09T07:11:07 1780989067

It reminds me of the Apple Vision Pro hype trailer. "Put on a helmet so you can view 2D photographs in 3D space." Present-day Apple sure has a way of making extremely impressive tech seem totally superfluous.

globular-toast · 2026-06-09T09:50:33 1780998633

Renting a car isn't a problem I have either. I've rented cars and vans plenty of times. I just phone up and talk to someone or email them. It's really easy.

A lot of these "problems" seem to stem from people just not wanting to interact with other people at all. Do we really want to become like Asimov's "Solarians"?

stevage · 2026-06-09T13:54:36 1781013276

Millennials and below absolutely do want to avoid interactions with strangers. Especially gen Z will go to great lengths to avoid making a phone call.

It sucks. This is why none of my local supermarkets have real checkouts anymore.

mingus88 · 2026-06-09T14:33:41 1781015621

I don’t think generational stereotypes are at fault here. This is another lazy retread about how Millennials/GenZ killed -insert thing here- straight from businessinsider.com

Nobody wants to make a phone call anymore because most calls are scams; phone networks are terrible and apps have replaced them, like a lot of legacy tech.

Supermarkets make more profit if they pass on the checkout labor to the customer. That’s the whole story.

These generations are disillusioned from decades of decline in our society that have root causes predating any of them.

stevage · 2026-06-09T16:03:45 1781021025

> Nobody wants to make a phone call anymore because most calls are scams; phone networks are terrible and apps have replaced them, like a lot of legacy tech.

This has nothing to do with the unwillingness of young people to use a phone to call a business.

> Supermarkets make more profit if they pass on the checkout labor to the customer. That’s the whole story.

It's half the story. The willingness of young people to accept it, and even prefer it, is the other half.

MrDunham · 2026-06-09T09:21:13 1780996873

Agree and disagree on the "planning a trip" use case... as I'm sitting on a river cruise on an AI co-planned vacation (we found the cruise, AI set the daily itinerary).

Now the big (BIG) caveat is that I used Claude Code on my Max 20x plan from within VS Code. I have a fairly decent harness that I'd built and was sure to prompt it to run several subagents, including one that grounded walking times with Google Maps directly.

I'd say this is FAR beyond what the average person would do ("Hey Siri, plan me a trip to Prague") but also it shows that the models can do it with the right harness and guidelines. This wasn't that hard for me to do, so it seems to be more of a feature buildout ("the travel expert" AI) with a few markdown files than anything.

All told: web search for grounding times/locations, map grounding for walking paths and times, an adversarial agent to keep the model(s) honest, and a little bit of prompting and you've got a really great travel planner.

In short: the average person won't do this, but if I can build it in a few hours any of the 100% of people working at Apple/OpenAI/Anthropic who are smarter than me can build it and bake it into Siri (or ChatGPT, Claude, etc).

whywhywhywhy · 2026-06-09T09:02:33 1780995753

Extremely surprised we didn’t get the “book me a flight” example that has been an AI demo used over and over but everyone’s so particular about flights I can’t see many people just wanting to one shot it.

The laundry list of object removal, spacial photos, better speech to text etc is always just the latest open models just being slapped in there and branded as Apple.

Ultimately the meat of this presentation was the work of people outside Apple.

stevage · 2026-06-09T10:17:15 1781000235

For me "book me a flight to X on day Y" is so easy to do manually I don't need AI.

Where I do want AI is for really complex queries, like "find me a time and money efficient itinerary through Europe visiting places I haven't been before. Present options and I'll tell you what I don't like about each of them then we'll narrow in on an optimal solution"

bsenftner · 2026-06-09T11:13:40 1781003620

The issue is anything that is of value requires some level of detail, of complexity, and that is only of interest to people that know that specific complexity, and it is a pain point for them. Now they'll care. Everyone else? Lost them. So, the marketing challenge is to find some aspirational complexity that people wish they knew, and how that can be solved with AI, and without turning that thing into a trivial nuisance, but a valued skill. That logical series right there is, well, too much for far too many.

audiala · 2026-06-09T09:44:49 1780998289

We are currently working on this with Audiala (iOS and webapp, Android should be released any coming days), making progress slowly but surely. We started to add tools to the AI chat to organise your trip and explore the place following your preferences and those of your group if you decide to share the trip. We can turn feedback into actual improvements pretty fast now, so would love to have yours and progress towards building the app you really want.

mr_toad · 2026-06-09T22:08:47 1781042927

> As an AI I can absolutely guide you through the process of renting the car, but I can't physically access the web site or type in the details for you

It’s like they were trained on corpuses of box ticking material, like iso 9000 documentation, or security certifications. And now they know how to describe what they should be doing, but they never actually do anything.

fragmede · 2026-06-09T06:42:17 1780987337

All you're saying is that (the harness you're using for) Gemini sucks. OpenAI has their own web browser to fill out forms with and Claude has a whole Cowork feature that interacts with you computer including your web browser.

If new Siri still sucks, well, it's sucked the entire time. The worst of it is the security aspect where the setting to let you use Siri without authenticating hasn't worked since they added it! (still broken, iOS 26.5)

madrox · 2026-06-09T07:03:38 1780988618

I get the impression Apple designers don't actually use AI, and so have no idea what to build, since users don't know what they want from AI yet either.

lurking_swe · 2026-06-09T16:32:28 1781022748

i have not tried it since i’m too scared to try, but i know Codex recently came out with a feature where it can control your chrome browser. Not spin up a chrome browser, but control your open browser tabs.

Presumably you might be able to task it with planning an itinerary with specific dates and bookings in mind, and then ask it to complete the task…sort of. The big gotcha i think is payments. Obviously you wouldn’t want to enter your credit card details into an llm lol. perhaps it would be ok if you had a saved card on file with your favorite airline, etc? Or maybe chrome has a feature to autofill a credit card for quick entry? Not sure.

Still…it’s a messy unsolved problem and we’re definitely not there. I wonder how this tech will look in 10 years from now?

dyauspitr · 2026-06-09T06:21:57 1780986117

I sometimes do random weeklong roadtrips. ChatGPT is the best way to do this. Attractions that are esoteric, RV parks with 50A hookups for under $50, oldest/biggest/best restaurant with local food in a town, historical stops along the way, suburbs close to big cities but where hotel prices are lower etc.

jeffaf · 2026-06-09T16:35:29 1781022929

Skill issue. I've had gpt5.5 help me plan a vacation. It even pointed out where I could save money. I did the actually bookings but it created the plan.

blitzar · 2026-06-09T08:47:32 1780994852

> interesting how anemic the use cases seem to be

Have a conversation with the average Ai power user (outside of tech / coding) and this is the level the conversation will be on.

chaos_emergent · 2026-06-09T13:22:33 1781011353

This morning, while putting in daily contacts, I realized I was down to my last few pairs. Still standing at the bathroom sink, I used the ChatGPT app on my phone to voice-command the Codex app on my computer to book an optometrist appointment for Saturday. It visited the website, figured out the API, and booked it.

No, this isn’t the same as planning a multi-day vacation. But it is plainly useful today, and it feels very close to handling more complex tasks like that.

Maybe the difference is the model and the harness. At this point, I’m starting to think some people are either gaslighting themselves about how useful these systems are, or overgeneralizing from one narrow setup. Gemini, for example, seems especially weak at agentic behavior.

The wholesale dismissal just feels strange coming from the HN community I’m used to.

stevage · 2026-06-09T13:52:10 1781013130

This is the kind of example that people use to demonstrate "usefulness" that falls so flat to me. I could make that phone call in a minute and have no doubts about whether my agent had stuffed up somehow.

It's just not compelling to say that an AI can do an easy task quickly. This is still worth zero dollars to me.

gedy · 2026-06-09T13:51:21 1781013081

It is neat, but come on you could've booked from website on the phone too.

ml-anon · 2026-06-09T12:28:44 1781008124

there is this bullshit term you hear paraded about the dwarkesh-adjacent circles: "capability overhang". Aside from being effectively meaningless jargon, there is a kernel of an idea that somehow the models are far more capable than what "normies" use them for.

Well, I think Siri AI puts this notion firmly to rest. Yes, if you have unlimited tokens and well-posed problems you can solve open Erdos problems. However, if you have meaningful real-world computational and reliability constraints then you better just stick to "summarize my messages and find the dogs in my photos".

And this isn't just Gemini, I can burn effectively unlimited Opus tokens and still get garbage code out or be run around in circles without very diligent oversight.

JamesKaranja · 2026-06-09T15:26:48 1781018808

Its feels as if apple are trying to create deterministic use cases for AI - a non deterministic solution!

fruit2020 · 2026-06-09T05:04:21 1780981461

Booking and renting is certainly possible, they just need your auth credentials

londons_explore · 2026-06-09T08:11:05 1780992665

I gave some LLM's my password manager and some credit cards to try to do this sort of thing lately.

They failed most of the time. Simple things like finding the right password for Gmail sometimes was out of reach. Anti bot techniques sometimes stopped it.

Impressively, sometimes they'd successfully write hugely complex bash or python scripts to do tasks on web pages they hadn't managed to do with the browser automation.

Jzush · 2026-06-09T19:57:27 1781035047

Yes, big companies do not know how to advertise AI. AI isn't a killer feature for anyone except people working in AI. No one uses it for anything but party tricks otherwise. The things AI would actually be useful for, it is incapable of doing.

AI as a "product" is about sucking up data for corporate interests first, then providing functionality to common people last with probably a few other steps in between.

Marketing departments have to twist themselves into pretzels and invent customers that don't exist in hopes to sell AI to people who look at those fake customers in the ads and go "Gee, I wish that was me!". People who casually book trips to Japan to shop for vintage clothes generally don't exist in such large numbers that they justify entire product stacks.

Here's what I need AI to do. Open an app, perform an action in said app, close app. Maybe open multiple apps and do things in other apps that are contingent upon data from one of the other apps.

Here's what AI can do. Poop Emojii with glasses....

OptionOfT · 2026-06-09T15:49:57 1781020197

My iPhone knows the forecast for today.

I installed iOS 27 yesterday.

I asked it: please notify me when the temperature goes above 80F (so I can close the windows).

Siri responded: it'll be 99F today in Phoenix.

...

mark_l_watson · 2026-06-09T19:50:21 1781034621

do you live in Phoenix?

I just asked Siri a few weather questions and named the city where I live, nailed it. My favorite digital device is my Apple Watch and if Siri improves over the next hear or two, that will be great for me.

OptionOfT · 2026-06-10T17:19:53 1781111993

I live in Phoenix. I would like it to tell me: 8am. Not what the actual high is today.

mark_l_watson · 2026-06-11T12:41:07 1781181667

I am on a wait list for the ‘better Siri model’ - what iOS and macOS betas just shipped with is awful. I do think the Apple Foundation Model built in to the system is better: I was using it from Python yesterday and it performs tool calling accurately and it is a very small model.

dansquizsoft · 2026-06-09T06:30:04 1780986604

"I got the top tier of Gemini AI." - that's your issue, get a subscription to a lab offering actual frontier models like Anthropic and OpenAI.

Mistletoe · 2026-06-09T06:45:26 1780987526

Nonsense.

https://livebench.ai/#/?highunseenbias=true

tmoravec · 2026-06-09T10:22:15 1781000535

Model != AI tool. Especially with Gemini, the model is fantastic, both in benchmarks and in the API. Sentiment is not that positive for Google AI.