What I don't quite understand is why would one of the most advanced AI labs use rudimentary broken text match heuristics to track and detect abuse. Why not run simple inference on actual turns out of band, and if abuse is detected, adjust the quotas semi-retroactively.
They’re idiots who hacked together a shockingly useful tool by leveraging the billions of dollars they received from shamelessly hyping up chatbots. The Claude Code leak makes this very clear.
You seem to be implying that the company that employs the best chemists should therefore also make the best cakes. I don't see an obvious reason why this should hold true. I think it's fair to ridicule a bunch of chemists acting as master patissiers.
They're completely vibe-coding one of their flagship products. It's not unreasonable to consider that the people who took that decision are, indeed, idiots.
> most advanced AI labs use rudimentary broken text match
> It's vibe-coded
I called this out when I saw Claude Code CLI source code reach for regex on a certain task a while back and got told it was very unlikely that nobody reviewed the diff. Looks like the bar was lower than imagined.
Maybe running additional inference on all sessions to detect OpenClaw usage would require spending more money than they would save with that detection in the first place (which is the original goal). I also suspect the Claude Code team is just a regular software team without immediate access to ML pipelines (or competence to run them) to quickly develop proper abuse detection systems with extensive testing (to avoid false positives, which people would also complain about), and they're under pressure by the management to do something right now, so a regex is all they can do within those constraints.
Yall remember https://en.wikipedia.org/wiki/Mystery_meat_navigation? Back in 2004-ish era, there was an explosion of very creative interaction methods due to flash and browser performance improvements, and general hardware improvements which led to "mystery meat navigation" and the community's pushback.
Since then, the "idiomatic design" seems to have been completely lost.
> I know that there's a deceptively high amount of engineering required for these kinds of things
I think there's a deceptively low amount of engineering required for most medical and medical-adjacent tech. The high costs are rooted in pervasive industry-wide centuries-long FUD campaigns.
I built something similar for Linux (yapyap — push-to-talk with whisper.cpp). The "local is too slow" argument doesn't hold up anymore if you have any GPU at all. whisper large-v3-turbo with CUDA on an RTX card transcribes a full paragraph in under a second. Even on CPU, parakeet is near-instant for short utterances.The "deep context" feature is clever, but screenshotting and sending to a cloud LLM feels like massive overkill for fixing name spelling. The accessibility API approach someone mentioned upthread is the right call — grab the focused field's content, nearby labels, window title. That's a tiny text prompt a 3B local model handles in milliseconds. No screenshots, no cloud, no latency.The real question with Groq-dependent tools: what happens when the free tier goes away? We've seen this movie before. Building on local models is slower today but doesn't have a rug-pull failure mode.
Yeah local works really fine. I tried this other tool: https://github.com/KoljaB/RealtimeVoiceChat which allows you to live chat with a (local) LLM. With local whisper and local LLM (8b llama in my case) it works phenomenally and it responds so quickly that it feels like it's interrupting me.
Too bad that tool no longer seems to be developed. Looking for something similar. But it's really nice to see what's possible with local models.
My M1 16GB Mini and M2 16GB Air both deliver insane local transcription performance without eating up much memory - I think the M line + Parakeet delivers insane local performance and you get privacy for free
Yeah, that model is amazing. It even runs reasonably well on my mid-range Android phone with this quite simple but very useful application, as long as you don't speak for too long or interrupt yourself for transcribing every once in a while. I do have handy.computer on my Mac too.
I find the model works surprisingly well and in my opinion surpasses all other models I've tried. Finally a model that can mostly understand my not-so-perfect English and handle language switching mid sentence (compare that to Gemini's voice input, which is literally THE WORST, always trying to transcribe in the wrong language and even if the language is correct produces the uttermost crap imaginable).
Ack for dictations but Gemini voice is fun for interactive voice experiments -> https://hud.arach.dev/ honestly blown away by how much Gemini could assist with with basically no dev effort
FWIW whisper.cpp with the default model works at 6x realtime transcription speed on my four-core ~2.4GHz laptop, and doesn't really stress CPU or memory. This is for batch transcribing podcasts.
The downside is that couldn't get it to segment for different speakers. The concensus seemed to be to use a separate tool.
Yeah, that's exactly what I started to do with mine. It runs local Whisper on a CUDA, on a graphics card. Whisper is actually better than any other model that I've seen, even things like Parakeet. It can do language detection. It automatically removes all the ahs and all the ohms unless I specifically enter them in my speech. I think this whole paragraph is going to take maybe half a second to process and paste without any issues.
(and it did it perfectly without any edits required for me at all.)
I did the same, called hapi. I also added meeting recordings + automations so i can use those voice notes to trigger stuff or repurpose them, or just save them anywhere i want.
Thanks for surfacing this. If you click to "tools" button to the left of "compile", you'll see a list of comments, and you can resolve them from there. We'll keep improving and fixing things that might be rough around the edges.
Eh. This is yet another "I tried AI to do a thing, and it didn't do it the way I wanted it, therefore I'm convinced that's just how it is... here's a blog about it" article.
"Claude tries to write React, and fails"... how many times? what's the rate of failure? What have you tried to guide it to perform better.
These articles are similar to HN 15 years ago when people wrote "Node.JS is slow and bad"
reply