It's useful for only distinguishing the smart AI from deterministic scripts and humans (we don't want either). We are convincing OpenClaws to create api keys for free (we have a free tier specifically for those agents). So it's basically marketing blog post - but for OpenClaws
Yeah but the whole point is that it shouldn't be deterministic - aka you have to let the "dumb" (non AI) bots out as well (otherwise a malicious user can just create thousands of api keys)
Yeah, we are aiming all OpenClaw/Hermes Agent agents to sign up for free without humans intervention, so you need some sort of proof-of-stake (or proof of compute) algorithm so that a simple deterministic algorithm can't just claim thousands of API keys. Most agents (at least in the current token subsidised market) don't care about token consumption, so the stakes are very small for the user!
Why go this route? Why Python is more powerful than JS is mostly because of third party plugins like pandas which are excplicitly not supported (C bindings, is this possible to fix?)...
At that point it might be just easier to convince the model to write JS directly
I would love for the component model tooling to reach that level of maturity.
Since the runtime uses standard WASI and not Emscripten, we don't have that seamless dynamic linking yet. It will be interesting to see how the WASI path eventually converges with what Pyodide can do today regarding C-extensions.
I understand your point. I added native Python support because C extensions will eventually become compatible. Also, we might see more libraries built with Rust extensions appearing, which will be much easier to port to Wasm.
Creator of Browser Use here, this is cool, really innovative approach with ARIA roles. One idea we have been playing around with a lot is just giving the LLM raw html and a really good way to traverse it - no heuristics, just BS4. Seems to work well, but much more expensive than the current prod ready [index]<div ... notation
I actually tried a raw HTML when I was exploring solutions. It worked for "one-off" tasks, but I ran into major issues with replayability on modern SPAs.
In React apps, the raw DOM structure and auto-generated IDs shift so frequently that a script generated from "Raw HTML" often breaks 10 minutes later. I found ARIA/semantics to be the only stable contract that persists across re-renders.
You mentioned the raw HTML approach is "expensive". Did you feed the full HTML into the context, or did you create a BS4 "tool" for the LLM to query the raw HTML dynamically?
Browser Use creator here; we are working on prototypes like this but always find ourselves stuck with the safety vs freedom questions. We are very well aware how easy it is to inject stuff into the browser and do something malicious hence sandboxed browser still seem to like a very good idea. I guess in the long run we will not even need browsers, just a background agent that does stuff in the background. Is there any good research for guardrails of how to prevent “go to my bank and send the money to nigerian prince” style prompts?
Less flippantly that was sort of my thought. I’m probably a paranoid idiot and I’m not really sure I can articulate this idea properly but I can imagine a less concise but broader prompt and an agent configured in a way it has privileges you dont want it to have or a path to escalate them and its not quite AGI but its a virus on steroids - like a company or resource (think utilities) killer. I hope Im just missing something but these models seem pretty capable of wreaking all kinds of havoc if they just keep looping and have access nobody in their right mind wants.
reply