Hacker Newsnew | past | comments | ask | show | jobs | submit | ok_dad's commentslogin

I love you this was the best sim game when I was a kid and now I’ll spend my weekend on your site!

I think people today think that compartmentalization is easy but sometimes in life your work and personal life and everything else gets all mixed up and you get situations where others might call it unhealthy but for you, it’s fine ante it’s how you want to live your life.

That’s just to say that crying over GitHub is fine, you’re a human, we cry over all sorts of stuff. Emotions are weird and you should not feel badly for having them.


yea just buy 300k worth of hardware and bob's your uncle

It was pretty hard to justify the purchase to the board but we got a decent deal from a nearby data-center (~15% discount). Thankfully, it's fixed cost, its an asset we can use for our taxes, and it will survive for years to come. The only thing we have to work on is maintenance as well as looking into some renewable energy options.

We're also looking into how to do some secure cost sharing with this so that all people need to pay for are what it costs for us to run everything! We're just planning on reserving at least 51% of the capacity for us and the rest for everyone else.


Sorry, didn't mean to be dismissive, I was just being a dickhead needlessly.

I actually respect this a ton, good work.


It's fine! There's no world where individuals can buy this kind of stuff. Our company is too small to do it, but I'd love for there to be a public utility of sorts for being able to use LLMs. It is absurd that only these >$1T companies are allowed to run this. I also find it dangerous for society to have so much power and wealth concentrated there too.

Anyway, this is the internet and skepticism is warranted :D.


Yea, I actually looked into a similar thing myself recently. I was looking at how we could replace Cursor, and I found that for ~10 people we'd need a half dozen H100's or something on that scale, which would cost ~$1500 per developer or so to build and maintain on cloud infra, and to buy it would cost roughly 3 developers yearly salaries or so (this aligns with your numbers). We don't use that much inference, so we decided paying Cursor ~$200-300 per dev per month is better, for now, but in the future we might regret that when prices normalize more. However, we also don't use cloud agents or independent agents, we basically use AI as a pair programmer, so if we had to drop AI coding assistants completely our process wouldn't break too badly. I wish I could task my 3080 gaming card to do some inference, but I can only get ~10B models on there, so it's kinda worthless unless it's for something a small model can do.

The best deal is arguably to buy as much on prem inference as you can reasonably expect to use by running the hardware around the clock, even at slower throughput, and use 3rd-party inference for things that are genuinely latency-sensitive. I just don't see how this resolves to needing a half-dozen V100, surely you're not using that much compute? You don't need to place your entire model on GPU, engines for on prem inference generally support CPU/RAM-based offload.

One dev's salary to give a 10 person team unlimited approximately free agentic coding for the foreseeable future, plus privacy.

And another salary to have someone set up and run it


That's the one!

Amazing read. Thanks to both of you for finding that.

> I later researched this further and found that no one at Microsoft, not a single soul, could articulate why up to 173 agents were needed to manage an Azure node, what they all did, how they interacted with one another, what their feature set was, or even why they existed in the first place.

This reads like a description of the SLS-based (aka Senate Launch System) Artemis program, which somehow ended up deciding that the insane Lunar Gateway should be a thing.

Destin (SmarterEveryDay on YouTube) [0] called out the entire nutball scheme to NASA, at NASA. This includes the SLS/Orion/Lunar Gateway insanity, and calling out the number of unknown, but very large number, of on-orbit refuelings that Starship would need to get to the moon.

In that video's comments, I believe there is someone who worked on the Orion-related system, who says ~"Yeah, we thought the delta-v was too low, we could have increased it, but no one was speaking with each other at a whole system level."

The mission drift at large orgs, gov and corp, is a huge problem that might one day be solved?

[0] https://www.youtube.com/watch?v=OoJsPvmFixU


Large orgs aim to produce some type of output. Their entire existence stems from a "perverse incentive."[1] Governments produce bills and laws, corps produce short-term profits, etc. I am pretty sure that preventing this type of waste consumes significantly more energy than creating the waste - e.g. the agile manifesto, the rework book.

Jobs was probably a good example of this. In my opinion, his image of an innovator is vastly exaggerated. What he did do well was to not invent things. E.G. liquid glass would have never seen the light of day under him: he was adept at saying "no" and preventing waste - Apple is now at the whims of anyone with the next stupid idea, the ideal example of wasteful behavior.

[1]: https://en.wikipedia.org/wiki/Perverse_incentive


Maybe go back and edit or delete your other ignorant comments now that you’ve actually learned something. You sat here derailing the conversation for an hour because you lacked the motivation to look up facts.

I don't need to delete my comments. The stand the way they are. Im comfortable with making mistakes and being wrong. And btw its not a derailment to ask questions related to jurisprudence of this administration's criminal actions.

I agree with this attitude and think it's helpful.

In the modern age there's no one set of things that people have seen, no common base of knowledge, and the process in this comment thread was super informative in the sense that if I had provided some links to start, a lot of confusion could have been avoided.


Maybe you need to be less comfortable with being wrong and educate yourself about politics more before you comment a dozen times with misinformation then claim you’re totally innocent and were just “asking questions”. That shit is so common amongst trump supporters. Not sayings you are but you’re playing the same game as they do.

I was transparent in where I was at. You are making statements based on nothing. We are all human, we make mistakes, your the one with the problem if you cant see this.

You almost seem proud of your ignorance, maybe you should be learning something from this rather than just being fine with your hour of arguing misinformation. If you are truly not ashamed of bring ignorant and spreading FUD because of that, then I guess there’s nothing more for me to say to such a person. You didn’t just make a mistake, you sat here and doubled and tripled and quadrupled down for an hour while people tried to correct you over and over and you simply didn’t even look up anything, only coming to your realization after someone fed you a link.

This is why they say lies go round the world before the truth gets its shoes on.


Pulling into the bike lane for 30 seconds causes bikers to have to unsafely pull around the car, possibly causing accidents. In some cities and lanes you may be endangering dozens of bikers during the 30 seconds.

I had to commute by foot for two years into a city, and I have to say I understand the rage. Cars nearly killed me a dozen times and I was always more safe than the law required of me as a pedestrian. Most drivers don’t understand their power with today’s massive cars.


>Pulling into the bike lane for 30 seconds causes bikers to have to unsafely pull around the car

Or, hear me out, they could stop if passing the car is unsafe.


Victim blaming right there. They have as right to the lane, cars don’t.

So you're advocating... what exactly? Crashing into the car on purpose because it's their right?

No, what the fuck, I’m advocating for cars to stay the fuck in their own space. Are you being purposely obtuse?

Cars in London apparently have a right to pull over to drop and pick up passengers.

Military academies are not upper class at all, mostly middle class folks. Officers are generally of the same stock as any other white collar job in engineering, law, business, etc.

Why are you letting the LLM drive? Don't turn on auto-approve, approve every command the agent runs. Don't let it make design or architecture decisions, you choose how it is built and you TELL that clanker what's what! No joke, if you treat the AI like a tool then you'll get more mileage out of it. You won't get 10x gains, but you will still understand the code.

Personally I've found "carefully review every move it makes" to be an extremely unpleasant and difficult workflow. The effort needed to parse every action is immense, but there's a complete absence of creative engagement - no chance of flow state. Just the worst kind of work which I've been unable to sustain, unfortunately. At this point I mostly still do work by hand.

It's unpleasant for me at normal speed settings, but on fast mode it works really well: the AI does changes quickly enough for me to stay focused.

Of course this requires being fortunate enough that you have one of those AI positive employers where you can spend lots of money on clankers.

I don't review every move it makes, I rather have a workflow where I first ask it questions about the code, and it looks around and explores various design choices. then i nudge it towards the design choice I think is best, etc. That asking around about the code also loads up the context in the appropriate manner so that the AI knows how to do the change well.

It's a me in the loop workflow but that prevents a lot of bugs, makes me aware of the design choices, and thanks to fast mode, it is more pleasant and much faster than me manually doing it.


This is my biggest problem with the promises of agentic coding (well, there are an awful lot of problems, but this is the biggest one from an immediate practical perspective).

One the one hand, reviewing and micromaning everything it does is tedious and unrewarding. Unlike reviewing a colleague's code, you're never going to teach it anything; maybe you'll get some skills out of it if you finds something that comes up often enough it's worth writing a skill for. And this only gets you, at best, a slight speedup over writing it yourself, as you have to stay engaged and think about everything that's going on.

Or you can just let it grind away agentically and only test the final output. This allows you to get those huge gains at first, but it can easily just start accumulating more and more cruft and bad design decisions and hacks on top of hacks. And you increasingly don't know what it's doing or why, you're losing the skill of even being able to because you're not exercising it.

You're just building yourself a huge pile of technical debt. You might delete your prod database without realizing it. You might end up with an auth system that doesn't actually check the auth and so someone can just set a username of an admin in a cookie to log in. Or whatever; you have no idea, and even if the model gets it right 95% of the time, do you want to be periodically rolling a d20 and if you get a 1 you lose everything?


I agree, but I also think that giving the LLM free rein is also extremely unpleasant and difficult. And you still need to review the resulting code.

I don't think there's anything difficult or unpleasant about the process of letting the LLM run free, that's the whole point, it's nearly frictionless. Which includes not reviewing the code carefully. You say "need" but you mean "ought".

Friction is not the only source of displeasure. I've tried out vibe-coding for something non-trivial; I found it deeply unpleasant.

Reviewing isn't hard when the diff is what you asked for. It's when you asked for a one-line fix and get back 40 changed lines across four files. At that point you're not even reviewing your change anymore, you're auditing theirs.

I agree with this too. I decided on constraints for myself around these tools and I give my complete focus & attention to every prompt, often stopping for minutes to figure things through and make decisions myself. Reviewing every line they produce. I'm a senior dev with a lot of experience with pair programming and code review, and I treat its output just as I would those tasks.

It has about doubled my development pace. An absolutely incredible gain in a vacuum, though tiny compared to what people seem to manage without these self-constraints. But in exchange, my understanding of the code is as comprehensive as if I had paired on it, or merged a direct report's branch into a project I was responsible for. A reasonable enough tradeoff, for me.


That's the trap though. The moment you approve every step, you're no longer getting the product that was sold to you. You're doing code review on a stochastic intern. The whole 10x story depends on you eventually looking away.

Just don’t buy the tools for 10x improvements, buy them for the 1.1x improvement and the help it gives with the annoying stuff like refactoring arguments to a function that’s used all over, writing tests, etc. They can also help reduce cognitive load in certain ways when you just use them to ask about your large code base.

Because the degree to which the LLM prompts you back to the terminal is too frequent for the human to engage in parallel work.

I’m basically saying don’t do parallel work, use it as a tool. Just sit there and watch it do stuff, make sure it’s doing what you want, and stop it if it’s doing too much or not what you want to do.

Maybe I’m just weird (actually that’s a given) but I don’t mind babysitting the clanker while it works.


I define tools that perform individual tasks, like build the application, run the tests, access project management tools with task context, web search, edit files in the workspace, read only vs write access source control, etc.

The agent only has access to exactly what it needs, be it an implementation agent, analysis agent, or review agent.

Makes it very easy to stay in command without having to sit and approve tons of random things the agent wants to do.

I do not allow bash or any kind of shell. I don't want to have to figure out what some random python script it's made up is supposed to do all the time.


This is a cool idea, can you write more about how your tools work or maybe short descriptions of a few of them? I’m interested in more rails for my bots.

I just made MCP servers that wrap the tools I need the agents to use, and give no-ask permissions to the specific tools the agents need in the agent definition.

Both OpenCode and VsCode support this. I think in ClaudeCode you can do it with skills now.

The other benefit is the MCP tool can mediate e.g. noisy build tool output, and reduce token usage by only showing errors or test failures, nothing else, or simply an ok response with the build run or test count.

So far, I have not needed to give them access to more than build tools, git, and a project/knowledge system (e.g. Obsidian) for the work I have them doing. Well and file read/write and web search.


Cool, thanks for the additional details!

I use Cursor but it's getting expensive lately, so I'm trying to reduce context size and move to OpenCode or something like that which I can use with some cheaper provider and Kimi 2.5 or whatever.


Because its SO much faster not to have to do all that. I think 10x is no joke, and if you're doing MVP, its just not worth the mental effort.

POC, sure (although 10x-ing a POC doesn't actually get you 10x velocity). MVP, though? No way. Today's frontier models are nowhere near smart enough to write a non-trivial product (i.e. something that others are meant to use), minimal or otherwise, without careful supervision. Anthropic weren't able to get agents to write even a usable C compiler (not a huge deal to begin with), even with a practically infeasible amount of preparatory work (write a full spec and a reference implementation, train the model on them as well as on relevant textbooks, write thousands of tests). The agents just make too many critical architectural mistakes that pretty much guarantee you won't be able to evolve the product for long, with or without their help. The software they write has an evolution horizon between zero days and about a year, after which the codebase is effectively bricked.

There is a million things in between a C compiler and a non-trivial product. They do make a ton of horrible architectural decisions, but I only need to review the output/ask questions to guide that, not review every diff.

A C compiler is a 10-50KLOC job, which the agents bricked in 0 days despite a full spec and thousands of hand-written tests, tests that the software passed until it collapsed beyond saving. Yes, smaller products will survive longer, but how would you know about the time bombs that agents like hiding in their code without looking? When I review the diffs I see things that, if had let in, the codebase would have died in 6-18 months.

BTW, one tip is to look at the size of the codebase. When you see 100KLOC for a first draft of a C compiler, you know something has gone horribly wrong. I would suggest that you at least compare the number of lines the agent produced to what you think the project should take. If it's more than double, the code is in serious, serious trouble. If it's in the <1.5x range, there's a chance it could be saved.

Asking the agent questions is good - as an aid to a review, not as a substitute. The agents lie with a high enough frequency to be a serious problem.

The models don't yet write code anywhere near human quality, so they require much closer supervision than a human programmer.


A C compiler with an existing C compiler as oracle, existing C compilers in the training set, and a formal spec, is already the easiest possible non-trivial product an agent could build without human review.

You could have it build something that takes fewer lines of code, but you aren’t gonna to find much with that level of specification and guardrails.


This is significantly slower than just writing the code yourself.

I don’t find it slower overall, personally, but YMMV depending on how you like to tackle problems. Also the problem space and the project details can dictate that these tools aren’t helpful. Luckily the code I write tends to be perfect for a coding agent to clank away for me.

For that kind of flow, I prefer to work without AI.

The agent mostly helps me reduce cognitive load and avoid the fiddly bits. I still review and understand all of the code but I don’t have to think about writing all of it. I also still hand write tons of code when I want to be very specific about behavior.

I have never found any utility in that. After all, you can still just review the diffs and ask it for explanation for sections instead.

> After all, you can still just review the diffs

anonu has explicitly said that they've wiped a database twice as a result of agents doing stuff. What sort of diff would help against an agent running commands, without your approval?


Hah I run my agent inside a docker with just the code. Anything clever it tries to do just goes nowhere.

Agent does not have to run in your user context. It is easy mistake to make in yolo mode but after that it's easy to fix. e.g. this is what I use now so I can release agent from my machine and also constrain its access:

    $ main-app git:(main) kubectl get pods | grep agent | head -n 1 | sed -E 's/[a-z]+-agent(.*)/app-agent\1/'
    app-agent-656c6ff85d-p86t8                          1/1     Running     0             13d
Agent is fully capable of making PR etc. if you provide appropriate tooling. It wipes DB but DB is just separate ephemeral pod. One day perhaps it will find 0-day and break out, but so far it has not done it.

> After all, you can still just review the diffs

The diff: +8000 -4000


You can ask it to make the changes in appropriate PRs. SOTA model + harness can do it. I find it useful to separate refactors and implementations, just like with humans, but I admittedly rely heavily on multi-provider review.

It’s terribly slow

I get it, but if tomorrow every inference provider doubled costs I still understand my applications code and can continue to work on it myself.

I hear this a lot but I don't think decades of experience atrophies irretrievably so quickly as to make it worth it (alone) to abstain from making full use of these tools. I still read and direct enough of the architecture to not be lost in the code it generates. Maybe you haven't tried using agents to reorganize/refactor as much - I have cleaner code than I did before when it was done by hand, because I can afford to tackle debts.

I also don't find the permissions it prompts for very meaningful. Permission to use a file search tool? Permission to make a web request? It's a clumsy way to slow it down enough for me to catch up.


You can push thousands of LOC every day while approving manually. If you went any faster you would not be able to read the code.


Solving “cars killing kids” is a lot harder than solving “dropping bombs on schools killing kids”. With the latter, we simply stop doing it. We can’t just stop using cars right away.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: