Hacker Newsnew | past | comments | ask | show | jobs | submit | lxe's commentslogin

What I don't quite understand is why would one of the most advanced AI labs use rudimentary broken text match heuristics to track and detect abuse. Why not run simple inference on actual turns out of band, and if abuse is detected, adjust the quotas semi-retroactively.

> What I don't quite understand is why would one of the most advanced AI labs use rudimentary broken text match heuristics to track and detect abuse.

It's vibe-coded. What's hard about understanding that?


They’re idiots who hacked together a shockingly useful tool by leveraging the billions of dollars they received from shamelessly hyping up chatbots. The Claude Code leak makes this very clear.

Pretty wild to say that the company with one of the best models (arguably the best) is a bunch of idiots.

You seem to be implying that the company that employs the best chemists should therefore also make the best cakes. I don't see an obvious reason why this should hold true. I think it's fair to ridicule a bunch of chemists acting as master patissiers.

> Pretty wild to say that the company with one of the best models (arguably the best) is a bunch of idiots.

It would be pretty wild if they didn't considering all the money thrown at them!

You're looking at one of the largest investments business (as a collective) has ever made. They had better be one of the forerunners in the space :-/


And you think with all of this money they are employing idiots?

They're completely vibe-coding one of their flagship products. It's not unreasonable to consider that the people who took that decision are, indeed, idiots.

The people working on the models almost certainly aren't the same people writing the code for their harness.

Even idiots can succeed if you uncritically funnel them hundreds of billions of dollars.

You can't just burn money in a pit to get the best AI model out. Undoubtedly some of the smartest people in the world are working on frontier AI.

> most advanced AI labs use rudimentary broken text match

> It's vibe-coded

I called this out when I saw Claude Code CLI source code reach for regex on a certain task a while back and got told it was very unlikely that nobody reviewed the diff. Looks like the bar was lower than imagined.


Maybe running additional inference on all sessions to detect OpenClaw usage would require spending more money than they would save with that detection in the first place (which is the original goal). I also suspect the Claude Code team is just a regular software team without immediate access to ML pipelines (or competence to run them) to quickly develop proper abuse detection systems with extensive testing (to avoid false positives, which people would also complain about), and they're under pressure by the management to do something right now, so a regex is all they can do within those constraints.

Fairly certain it went like this:

Somebody at the top freaked out.

Somebody had to do something fast.

A prompt was given to Claude Code to fix Claude Code to stop Claude Code from being used for non-Claude Code stuff.

Commit made. Emergency release.

OpenClaw number went down. Everybody's pre-IPO stock options continued to go up.


> Why not run simple inference on actual turns out of band, and if abuse is detected, adjust the quotas semi-retroactively.

I suppose because running inference of any kind is a helluva lot more demanding than running a regex and less deterministic.


Yall remember https://en.wikipedia.org/wiki/Mystery_meat_navigation? Back in 2004-ish era, there was an explosion of very creative interaction methods due to flash and browser performance improvements, and general hardware improvements which led to "mystery meat navigation" and the community's pushback.

Since then, the "idiomatic design" seems to have been completely lost.


Is this what the hamburger button is made of?


I mean, your guess is as good as mine as to what options the corresponding menu will actually contain, so....


hahaha I’m glad I’m just a procedurally generated NPC

I built one for cross platform — using parakeet mlx or faster whisper. :)


> I know that there's a deceptively high amount of engineering required for these kinds of things

I think there's a deceptively low amount of engineering required for most medical and medical-adjacent tech. The high costs are rooted in pervasive industry-wide centuries-long FUD campaigns.


> centuries-long FUD campaigns

That dastardly Ben Franklin with his bifocals..


> ChatGPT, read this article and turn it into a AGENTS.md


Distinguished staff level trolling


Honestly, I found that the best way to use these CLIs is exactly how the CLI creators have intended.


I built something similar for Linux (yapyap — push-to-talk with whisper.cpp). The "local is too slow" argument doesn't hold up anymore if you have any GPU at all. whisper large-v3-turbo with CUDA on an RTX card transcribes a full paragraph in under a second. Even on CPU, parakeet is near-instant for short utterances.The "deep context" feature is clever, but screenshotting and sending to a cloud LLM feels like massive overkill for fixing name spelling. The accessibility API approach someone mentioned upthread is the right call — grab the focused field's content, nearby labels, window title. That's a tiny text prompt a 3B local model handles in milliseconds. No screenshots, no cloud, no latency.The real question with Groq-dependent tools: what happens when the free tier goes away? We've seen this movie before. Building on local models is slower today but doesn't have a rug-pull failure mode.


Yeah local works really fine. I tried this other tool: https://github.com/KoljaB/RealtimeVoiceChat which allows you to live chat with a (local) LLM. With local whisper and local LLM (8b llama in my case) it works phenomenally and it responds so quickly that it feels like it's interrupting me.

Too bad that tool no longer seems to be developed. Looking for something similar. But it's really nice to see what's possible with local models.


> The "local is too slow" argument doesn't hold up anymore if you have any GPU at all.

By "any GPU" you mean a physical, dedicated GPU card, right?

That's not a small requirement, especially on Macs.


My M1 16GB Mini and M2 16GB Air both deliver insane local transcription performance without eating up much memory - I think the M line + Parakeet delivers insane local performance and you get privacy for free


Yeah, that model is amazing. It even runs reasonably well on my mid-range Android phone with this quite simple but very useful application, as long as you don't speak for too long or interrupt yourself for transcribing every once in a while. I do have handy.computer on my Mac too.

https://news.ycombinator.com/item?id=46640855

I find the model works surprisingly well and in my opinion surpasses all other models I've tried. Finally a model that can mostly understand my not-so-perfect English and handle language switching mid sentence (compare that to Gemini's voice input, which is literally THE WORST, always trying to transcribe in the wrong language and even if the language is correct produces the uttermost crap imaginable).


Ack for dictations but Gemini voice is fun for interactive voice experiments -> https://hud.arach.dev/ honestly blown away by how much Gemini could assist with with basically no dev effort


On macs you actually don't need it as long as you have enough RAM.

I run 120M Parakeet model formt STT thing. Even that tiny model works much better than macos dictation these days.


No. Give it a try I think you’ll be surprised


I've installed murmure on my 2013 Mac, and it works through 1073 words/minute. I don't know about you, but that's plenty faster than me :D


FWIW whisper.cpp with the default model works at 6x realtime transcription speed on my four-core ~2.4GHz laptop, and doesn't really stress CPU or memory. This is for batch transcribing podcasts.

The downside is that couldn't get it to segment for different speakers. The concensus seemed to be to use a separate tool.


I also built one.. mine is called whispy. I use mine to pump commands to claude. So far a bit hit & miss, still tweaking it.


Yeah, that's exactly what I started to do with mine. It runs local Whisper on a CUDA, on a graphics card. Whisper is actually better than any other model that I've seen, even things like Parakeet. It can do language detection. It automatically removes all the ahs and all the ohms unless I specifically enter them in my speech. I think this whole paragraph is going to take maybe half a second to process and paste without any issues.

(and it did it perfectly without any edits required for me at all.)


I did the same, called hapi. I also added meeting recordings + automations so i can use those voice notes to trigger stuff or repurpose them, or just save them anywhere i want.


Handy for me has worked wonders


Thanks for surfacing this. If you click to "tools" button to the left of "compile", you'll see a list of comments, and you can resolve them from there. We'll keep improving and fixing things that might be rough around the edges.

EDIT: Fixed :)


Thanks! (very quickly too)


Eh. This is yet another "I tried AI to do a thing, and it didn't do it the way I wanted it, therefore I'm convinced that's just how it is... here's a blog about it" article.

"Claude tries to write React, and fails"... how many times? what's the rate of failure? What have you tried to guide it to perform better.

These articles are similar to HN 15 years ago when people wrote "Node.JS is slow and bad"


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: