Id also be interested in more details as sibling comment. I find that when I try to build stuff, its like building skyscraper from straw. What methods are moving you forward the most?
There's a 64MB game boy advance cartridge with shrek on it [1]. Looks pretty horrible [2]. But the GBA only has 16KB fast / 256KB slow RAM, and a 16MHz CPU.
Video resolution: 128x72, hahah. Late 90s RealPlayer postage stamp video is back! To its credit, that whole movie is probably smaller than RealPlayer itself was.
I wish I knew how to make it regressively verify its assumptions, like a kind of hook but firing before a sentence is written, or perhaps after and then corrected. I feel like it assuming things clearly wrong is its biggest weakness.
> drop the human pretense entirely. Make the agent sound clinical, robotic
Id pay to be able to reliably set LLMs to this mode, but ofc because LLMs are taught on corpus of HUMAN text, they always, sooner or later, return to the good old penpal mode.
Also, in Claude Desktop app, I ask to edit a file, it complains it cant access files, I then realize im in Chat and not Code interface. Why cant such a smart machine figure out to switch the modes, or borrow the skills/abilities from one tab away into this tab? Instead I get A4 page of text explaninig what can I do to edit the file myself or how to feed it, but the "just click Code" is just never there. I would guess this is just a system prompt away, why is all this still so neglected?
> such a smart machine figure out to switch the modes
Because it's not smart. We keep confusing verbosity with smartness. AI will happily keep yapping nonsense to an inattentive listener. An actually smart entity would not do that if not acting maliciously.
I've found OpenAI's personality setting works pretty well if you choose the professional personality and disable all the frills. I did that a while back so I'd have to go dig up the details. Since then it doesn't even engage me if I make a joke or similar - it just focuses on the goals.
> Id pay to be able to reliably set LLMs to this mode,
You can do it for free. Just give it instrucitons to avoid emotional tones and flattery and it will sound a lot more robotic.
If you look into other examples I'm sure you will find other good instructions based on your need
Harnesses do fix it IMO - it’s why Claude code and Codex had a massive jump in alleged productivity on release and then seems to have flatlined. But a custom harness _would_ allow you to do things like “on every message, run lint validation and tests”. That in and of itself would be wildly useful.
The harnesses we have are almost stunningly incomplete IMHO. I've been trying `pi` recently, and quite like that it comes with a minimal set of tools by default -- and that I can easily override or replace the ones that it ships.
I've only just started working with it, but clamping `read/write/edit` to only allow editing files in the current directory, banning `bash` and mandating I write tools for the specific commands I want it to execute, has made me much happier. Running Claude inside a VM or similar to sandbox it is nuclear overkill; I've always been surprised that that's seemed like the state of the art.
With a better harness, the model can't choose to rename things with search and replace; if it wants to rename things, it _must_ call the LSP to do it. If it's going to write code, as you suggest, the harness _forces_ linting/formatting to run.
(Reading my own comment back, I am worried that the fucking AI writing style is infecting me :()
One of the problems with tools is the permissions for them. I can either grant Claude access to this one specific python command, or free run with python to do whatever it wants, but not “you can execute the python scripts in this directory structure”.
Claude’s “api access required” approach means that I can’t even experiment with customising the harness without doubling up…
Yes, that aggravates me too. I noted recently, Claude had a phase where it would write little one-off Python scripts to aid it in analysis -- which is super useful! But when it's written ten scripts in a row, each of which I've had to review and each of which I've had to approve by hand, it gets pretty annoying. If I could bless it with "if it only uses these Python libraries, pre-approve the script", that would've made life a lot simpler, but of course, that's not possible. Sigh.
Honestly - I think it's because it goes against the "vibe" part of the tooling - why do you care what the code looks like as long as when you run it it does what you want it to do?
I just want my brightness/volume indicator back in the middle of my screen without fluffy graphics.... :( I dont know why it isnt even a power-user option....
Well, that's okay; you're young. There are better and more topical jokes in your future, and it will serve you well in making them to have encountered this particular, extremely stale and suspiciously stained, cookie. Just be careful you don't take too big a bite!
reply