Honestly, it sounds like, assuming you have no ethical qualms, you could get by with a Mac or AMD 395+ and the newest models, specifically QWEN3.5-Coder-Next. It does exactly as you describe. It maxes out around 85k context, which if you do a good job providing guard rails, etc, is the length of a small-medium project.
It does seem like the sweet spot between WallE and the destroyed earth in WallE.
I have ethical qualms to varying degrees with most LLMs, primarily because of copyright laundering.
I'm a BSD-style Open Source advocate who has published a lot of Apache-licensed code. I have never accepted that AI companies can just come in and train their models on that code without preserving my license, just allowing their users to claim copyright on generated output and take it proprietary or do whatever.
I would actually not mind licensing my work in an LLM-friendly way, contributing towards a public pool from which generated output would remain in that pool. Perhaps there is opportunity for Open Source organizations to evolve licenses to facilitate such usage.
For what it's worth, I would be happy to pay for a commercial LLM trained on public domain or other properly licensed works whose output is legitimately public domain.
thats pessimistic. do the calc assuming Cloud provider X changes your nondetermistic output every Y Months by Z probability and increases prices by 10% every 6 months.
slow and steady is worth exponentials. keep slopppping it my boid.
It does seem like the sweet spot between WallE and the destroyed earth in WallE.