Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>Bashing in a car so someone else can’t ride it? It just makes the other side look like the good guys and their side look like the bad guys

Not really. If your childish definition of "good" and "bad" (always a sign of a simplistic mind btw) is "good for Waymo's bottom line" and "bad for Waymo's bottom line" then sure. But if your community's people are suffering and looking to rideshare gigs for a lifeline, and then a company threatens them, then destroying their assets is easy to justify as "good" from a humanistic perspective, even categorically.


So it burns tokens? Funny how that lines up with the incentive to pump numbers before going public

You’re saying you’d rather support a communist government in this matter than your fellow citizens. Thats a cool direction.

There was no caricature as a "AI follower" etc. I gave my arguments you didn't.

This works with touch control on mobile. Pretty amusing.

When someone submits PRs fulky made by Clade any "cooperative work" is out the door

The scoring is just based on a simple prompt which is given the game state at the start and end of the turn and the log of tool calls and the final turn summary. The prompt asks it to evaluate the quality of the simulation from 0 to 10, and to give pass or fail for if it is legal.

It is far from ideal, but from my testing, even underpowered small LLMs that could not complete a single legal turn were reasonably good at judging if a simulation was legal. The final judging was all done by gpt-5.5 (medium) which might have given the OpenAI models an advantage, but from all the simulations I looked at, it seemed pretty fair.

This benchmark ended up be more of a test of how well an LLM can call tools without contradicting itself or backtracking. Most of the failures were not because of breaking magic rules, but because it could not sequence the tool calls correctly.

For example: https://app.mtgautodeck.com/public/benchmarks/6349dda2-4069-...

and: https://app.mtgautodeck.com/public/benchmarks/dcc18bd8-339d-...

The failure mode seems to be that some models are overly trained to start tool calls, even when the model itself knows that it should not be calling the tool. Both of those examples were not errors because the judge prompt said they were illegal. In both of those examples the model stopped the simulation itself knowing that it made a tool error.

The Opus 4.8 examples are especially weird because it will consistently make the same tool call error 2 or 3 times in a row, and it will put things like "placeholder" or "noop" for the tool call reason.


How about reviving key signing parties?

I think the freeways are going from larger particles to smaller ones as DEF gets rid of the bigger diesel particles.

smaller stuff is more dangerous and goes deeper into your body:

- PM10: inhalable dust entering the nose and mouth.

- PM4: respirable fraction that can reach the gas-exchange region.

- PM2.5: fine particles that penetrate deep into the lungs.

- PM1: ultrafine particles with potential to translocate beyond the lungs.


Were you trying to link to something? There's no body or link in your post, so I can't tell what you're referring to.

Why learn arithmetic when we have calculators?

Reading, writing, and math are foundational skills that, aside from having enormous utility in their own right, are also crucial for developing sharp, creative, and analytical minds.

Take writing as an example: it challenges you to organize your thoughts, patch up the weaknesses of your arguments, and find effective means of connecting with your audience. In so doing, you restructure your own understanding of the world, deepening your expertise and mental schemas. That's something an LLM can't do.


The UI looks incredibly sharp and the core problem you are solving is very real. Excited to see how the roadmap evolves!

Ian Rush said it best: "It's best being a striker. Miss five, score the winner, you're a hero. The goalkeeper plays a blinder, lets one in, and he's a villain."

Every place I've worked rewards the firefighter over the person who made sure nothing ever caught fire. And the worst part is the math is obvious to everyone except the people who set the incentives.


> Logic is not subject to sanctions

... and vice versa.


The US could avoid this by making sure that the social contract here is still in place. People destroy and protest self driving cars because they aren't liberatory automation; they're just another taxi billing us except that they also put humans driving rideshares out of a job. Does China have a problem with their people being forced to do rideshare gigs to get enough food to eat and competing with self driving cars?

NEVER take Hofstadter's law into account!!!!

Yes but most people aren’t trailblazers, they’re horribly average and just want to get by. This includes me.

This industry is hostile. You need a self preservationist mindset if you want long term success. Or, at least, it feels that was for someone who isn’t absurdly talented, wealthy, or connected. So, for now, we put our head down and be good little cogs.

I think of it this way. In a company with a good culture, I will build a rapport with my management and give my honest opinions. In a company with a bad culture, I just nod along and say “uh huh yeah yes sir what a great idea!” Because I know that’s it’s gonna get pushed through no matter what I say, and my opinions will only serve to hurt me.

And, right now, tech, or even America, is like one big company with a rotten, rotten culture.


Like I said earlier, this technology does not exist. And even if it did, the infrastructure required for everyone to own and operate such a car would be orders of magnitude more expensive and much much much politically harder choice to approve then to build out public transit and to provide access services.

It's a much cheaper way to run a freemium pricing plan than to give out actual time on a real GPU.

The free models aren't as advanced as the commercial ones, so you still need to pay to get better service. Many companies do this by hosting the smaller models for free, and charging for access to the bigger ones, but the costs of running even the small models adds up, so if they can get you to run it on your own hardware, they save a lot on GPU time, but still get the advertising exposure of a freemium pricing plan.


Cursor's Auto mode is built on this premise though I can't say how effectively it categorizes with limited experience.


Yea man. That is what sensible people do. Use these as a better search, and use it to lookup, and learn stuff while YOU do stuff.

And make maximum use of it to learn as much as possible, while it lasts...


If you do something at work with a subordinate, it’s no longer personal.

I find it is a quite reliable workflow to ask a strong model to design a plan and then point a weaker one at executing. The agent harnesses themselves are baking in similar concepts though.

We're about to win Genocide Bingo.

Rule 1 of Genocide Bingo is: Don't win Genocide Bingo.

https://www.youtube.com/watch?v=4kDPxbS6ofw


Sounds like you enjoy car analogies. Great!

Do you think you are more likely to get injured in a car accident or have an agent do something harmful because you ran it outside of a sandbox?


The pedestrian is not the one driving the weapon. It is your fault if you hit a pedestrian, even if they walk right off the curb, even if they’re drunk.

Unless they intentionally jump in front of your car in an attempt to commit suicide, it is always on you to ensure you can stop and respond to an emergency, and a pedestrian is such a thing, imho.


The article below this one clued me into this. https://news.ycombinator.com/item?id=48500303. However, it's dead (idk y) and in my experience, most dead submissions stay that way.

I mean we're talking about people who decided their life's work would be to run a recipe website so we already can't expect that much.

Having heard radio interviews with and without 'internal editing' to remove ums and ahs, most of the time I'd rather the edited version. It's more concise and focused, and I find it easier to comprehend. Too many ums and ahs and my mind wanders, and if it's radio, I can't go easily go back to try again. When I've listened to podcasts or audiobooks, I could never easily go back a little to try again either, and I gave up on them (even though I have some content I really want to listen to, it's too frustrating, so it's not happening). But I'm sure other people have different preferences.

I also don't care for writing that could have been made a lot more concise. It's a lot of work to make things shorter, but I think it's worthwhile.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: