Hacker Newsnew | past | comments | ask | show | jobs | submit | Tiberium's commentslogin

I just want to comment that I think it's a good change if we look past the AI involvement.

Bun has had an extremely high amount of crashes/memory bugs due to them using Zig, unlike Deno which is Rust.

Of course, if Bun's Rust port has tons of `unsafe`, it won't magically solve them all, but it'll still get better


> Of course, if Bun's Rust port has tons of `unsafe`, it won't magically solve them all, but it'll still get better

You get very few of the Rust guarantees when you litter your code with unsafe to get around the safety checks (which is what they're doing here). I would not recommend running this in production.


> Bun has had an extremely high amount of crashes/memory bugs

Any stats/source? Not that I think it's false

> and the ugly parts look uglier (unsafe) which encourages refactoring.

Looks like Bun owes that to itself to some extent, not solely because of the language


You want a better source than the actual author of Bun?

Authors can't exaggerate? Maybe some actual numbers can convince people.

Here: https://github.com/oven-sh/bun/issues?q=is%3Aissue%20%22Segm...

Around 2500 issues with segmentation fault.


As compared with 41 for deno

https://github.com/denoland/deno/issues?q=is%3Aissue%20%22Se...

With the total number of issues being 16,458 for bun and 14,259 for deno.


The cool thing is the author doesn't actually have to convince anyone

I believe the author is the creator of Bun.

Is he working for Anthropic now?

anthropic bought bun recently


FTA:

> why: I am so tired of worrying about & spending lots of time fixing memory leaks and crashes and stability issues. it would be so nice if the language provided more powerful tools for preventing these things.

Not a hard number obviously but a clear indication those issues exist.


I don’t understand: just use an agent to find all memory leaks and segfaults. I don’t get the argument if you are gonna vibe code anyway.

With unlimited tokens make it a lint rule or auto formatter.


LLMs are a force multiplier, not magic. They benefit from good tooling.

If you look at percent of segfault errors in each repo, Bun had a much larger percent. Although don't quote on me.

> Bun has had an extremely high amount of crashes/memory bugs due to them using Zig

This just sounds like they are not good at using Zig. I have been daily driving ghostty on linux for a fairly long time now and I have never seen these kinds of issues. I have also used ghostty on macos for a bit and didn't have any problems there either.

Zig is really good for writing stable and reliable code. There is also a database written in Zig that seems to be fairly successful [0].

I also wrote zig for some time and compiler/toolchain was really pleasant to use. I wrote more segfaults in Rust ffi code than all segfaults I had in Zig in total while I was writing Zig.

[0] https://tigerbeetle.com/


Ah, yes, the "you're holding it wrong" defense. If one tool has a higher safety rating than another, significantly so, preventing entire classes of mistakes from happening that the other does not, in a kind of superset manner - even the most skilled craftsman will inevitably make mistakes that would have been prevented by the safer tool.

> This just sounds like they are not good at using Zig.

That's odd, because of the visibility of team Bun using the language, one would think they could get whatever help and guidance they asked for. Seems weird for team Bun to complain about crashes, leaks, and bugs if they could have what they are doing wrong explained to them or their issues fixed in a timely manner.


I think the main problem with Bun is that they are trying to move very quickly.

Tigebeetle devs spend 90% time working on stability, safety, tests and so on. They don't need new features, they need reliable software. Their database is pretty simple in terms of features and their goal was always stability and speed. Bun devs spend the majority of the time adding new features.


Not sure if ghostty is the best example https://mitchellh.com/writing/ghostty-memory-leak-fix

Last time i checked their issue tracker (in 2025), the main source of problem was the engine, not their Zig code. A lot of core dump was happening inside and around JSC.

I remember back in the day we used to blame the user and not the tool, but I guess we changed that notion when it comes to tool vs tool comparisons LOL

And they're clearly marked as `unsafe`, so easy to find, which gives them a nice list of issues to address.

Is your claim that using Zig ends in an "extremely high amount of crashes/memory bugs?" Wouldn't that mean that it isn't even feasible to make high-quality software with such a tool? There is a lot of quality stuff made with C/C++, so what is Zig doing wrong?

> Is your claim that using Zig ends in an "extremely high amount of crashes/memory bugs?" Wouldn't that mean that it isn't even feasible to make high-quality software with such a tool?

What caused you to hallucinate such a broad blanket statement? The point is the memory unsafety issues they ran into would be categorically impossible in safe Rust, which is why they're doing this in the first place.


It's not hallucination, it's a basic extrapolation. "Bun has had an extremely high amount of crashes/memory bugs due to them using Zig" is the same statement as "using Zig resulted in Bun having an extremely high amount of crashes/memory bugs". It is then natural to ask whether their position is "using Zig results in an extremely high amount of crashes/bugs" in general.

That's a hell of a lot more than "basic extrapolation." You're misrepresenting the original claim to fight against one that's trivially easy to dispute. "Bun has had an extremely high amount of crashes/memory bugs due to them using Zig" (which unlike Rust, doesn't prevent you from writing them) is a completely different statement than your "using Zig results in an extremely high amount of crashes/bugs." Implying that such a generalization was even on the table is insulting.

Yes, obviously you can write high-quality software in Zig. But does Zig categorically reject the kind of bugs Bun was suffering from? Rust does.


The point is that the "extremely high amount of crashes/bugs" is maybe not the fault of Zig after all, as was implied.

How software behaves is very obviously downstream of the tools (in this case programming language) used to build it.

"Downstream of" is doing a lot of work in that sentence. Language has an effect on, but in no way determines, the reliability of software written in it.

Downstream doesn't imply determinism.

The original claim is one of determinism. Your use of the term "downstream" is hiding the distinction; it can be read in either way, so it bridges the gap between the position you want to defend ("using Zig causes a higher probability of memory bugs") and the position you're forced to defend ("using Zig results in extremely many memory bugs").

In short, I'm accusing you of doing a motte-and-bailey.


It's generalizing from Bun (which might be especially tricky code) to other software that might not have the similar issues. There are lots of different kinds of software.

Even assuming that's a correct interpretation, does "using C/C++ results in having an extremely high amount of crashes/memory bugs" not true?

No, that's provably false by a fairly simple existence proof. If it was true that using C results in an "extremely high amount of crashes/memory bugs", we would expect to not find any substantial pieces of software written in C without an "extremely high amount of crashes/memory bugs". Now where exactly you draw that line is necessarily going to be somewhat arbitrary, but by any definition, I think we can all agree that SQLite does not fit that description. Yet SQLite is written in C. Therefore, we conclude that the statement must be false. QED.

Now C does have some aspects which make it more prone to crashes and memory bugs. The less strong statement of "using C results in a higher propensity for crashes/memory bugs than Rust" is absolutely true, I would argue. And both C++ and Rust inherit some (but not all, and not the same) of the aspects which make C prone to memory bugs. (So does Go, I would argue, but less than C++ and Zig.)


Bah waking up today to notice a typo, after the edit window. "And both C++ and Rust inherit some ... aspects" was of course meant to be "And both C++ and Zig inherit some ... aspects".

>I think we can all agree that SQLite does not fit that description

One of the reasons WebSQL died was due to how many memory bug related vulnerabilities SQLite had.


You know, I try to ask questions rather than making assertions in order to better my chances at provoking useful thought and conversation.

It is basically Modula-2 / Object Pascal with C like syntax.

While bounds checking, improved argument passing, typed pointers, proper strings and arrays are an improvement over C, it still suffers from use after free cases.

C++ already prevents many of those scenarios, at least for those folks that don't use it as a plain Better C, and actually make use of the standard library in hardned mode. When not, naturally is as bad as C.

Also to note that the tools that Zig offers to prevent that, are also available in C and C++, but people have to actually use them, e.g. I was using Purify back in 2000's.

Then there is the whole point that Zig is not yet 1.0, and who knows what will still change until then.


> Then there is the whole point that Zig is not yet 1.0, and who knows what will still change until then.

Seems like their luck finally ran out. For the longest time, they were getting all kinds of passes, as if a post 1.0 language, that others don't get. 10 years is quite a long time not to hit 1.0 or still be into beta breaking changes. Though I think that (the luck) was significantly aided by their perpetual and odd HN boosting.

> While bounds checking, improved argument passing, typed pointers, proper strings and arrays are an improvement over C, it still suffers from use after free cases.

While Zig was a bit safer and more modern C alternative, safety was arguably not so much their selling point. Plenty of other C alternative languages are equally or more safe. Dlang and Vlang, both now having optional GCs and ownership, are examples.


Yeah, pity that D somehow lost its adoption opportunity.

Now you can get most of it via C# AOT or Swift, with much better ecosystem.

Still, it is part of the official GCC and LLVM frontends, so there is that.


Thank you for actually making the effort to respond to the curiosity in my question.

You would like the T3X language as an exercise to port stuff from Free Pascal too it. In a near future I plan to port two libre text adventures with it, Beyond the Titanic and Supernova. If it fits under T3X, it might run in 'high end' CP/M systems out there.

https://t3x.org/t3x/0/index.html

https://t3x.org/t3x/0/t3xref.html

Beyond these Curses simple games, there's a 6502 assembler and disassembler among a Kim-1 simulator, Micro Common Lisps and whatnot.


Have to look into it, thanks.

It is much harder to write quality stuff in c/c++ that doesn't have memory bugs (use after free, out of bounds access, use of unitialized memory, double free, memory races, etc.). I wouldn't say it isn't feasible to build high quality software in those languages, but even the highest quality software written in those languages has these types of bugs. Zig is better than c, and maybe a little bit better than c++, especially with respect to spatial memory bugs, but it doesn't provide the same garantees as rust.

I use clang, LLM, zig compiler, brave, firefox, kde, linux, steam, PC games, neovim, ghostty and more software written in c/c++/zig, and I can't remember the last time I had a crash issue with memory issues.

KDE also includes many other programs inside it like music player, document reader etc. that I never had any issues with.


Based on what? I am not familiar with this language called called "c/c++" but if you are writing Modern C++, you shouldn't be creating problems like "double free." It's really not that hard to avoid at all. This reminds me of how all the people carried on as if they were making the kernel so much safer not realizing they needed to use unsafe rust. I think so many people call themselves programmers now but so few know very much about computing beyond whatever the latest fad web framework is up to.

This kind of argument is why security folks look down on C and C++ developers.

Because instead of discussing serious matters, they missed English grammar class on the use of / and then get up in arms about the use of "and, or".

Additionally, even code bases from companies that seat at WG21, lack the use of the so called Modern C++, without any language feature or header files inherited from C.

Better C with some niceties keeps being the prevalent approach, unfortunately.

C strings, C arrays, pointer math, printf family, C style casts, macros instead of templates, no STL, and if not hardned ...


Sure if you restrict yourself to a subset of c++ that avoids the more unsafe features, you can avoid some of those problems, but not all of them. And IME, a lot of c++ in the wild still uses those unsafe features, especially when interfacing with c libraries. And even if you always use smart pointers and make sure you always initialize your variables there are still plenty of ways you can get undefined behavior in c++.

> This reminds me of how all the people carried on as if they were making the kernel so much safer not realizing they needed to use unsafe rust.

Those are not contradictory. Confining unsafe code to a few unsafe blocks makes it easier to identify areas that need closer scrutiny. Just because there are unsafe blocks doesn't mean that using rust in the kernel isn't making it safer.


The answer is that C (and by extension Zig, C++) code goes through a hardening process. New code in these languages tends to be unsafe. But bugs and vulnerabilities get squashed over time. Bun gets updated fast and so has a lot of new unsafe code.

The statement “there exists a project where zig led to an extremely high amount of crashes/memory bugs” does not imply “all zig projects have an extremely high amount of crashes/memory bugs”.

This is a classic logic problem - eg “there is an orange cat” doesn’t imply “all cats are orange”.


> There is a lot of quality stuff made with C/C++

There’s a lot of leaky crap written in those languages too. One of the core promises of Rust is that the compiler will catch memory issues other languages won’t experience until runtime. If Zig doesn’t offer something similar it’ll make Rust very compelling.


Zig is a love letter to C. It does not do much of anything to address memory management. Doesn't even have any concept of ownership like C++ does (ergo, no equivalent of unique_ptr / shared_ptr). All you get over C is the addition of defer, and even that isn't really that different if you're using GCC or Clang and thus have __attribute__((cleanup)).

This is a hot take, but programming languages haven't progressed since the 90's. We've been conditioned to believe that if you want to be a serious programmer, you have to either use C++-style RAII (which includes Rust), or garbage collection, and there's no in-between, and C programmers are dinosaurs who can be ignored.

Arena allocators are a great way to automatically manage memory allocations. You malloc a whole bunch of memory and release it all with a single free, which makes it much easier to reason about your program's memory safety.

Casey Muratori has a good video talking about this. https://www.youtube.com/watch?v=xt1KNDmOYqA

And about Zig, you have an Arena Allocator out of the box: https://zig.guide/standard-library/allocators/ . And it's not just limited to that, you have debug allocators that detects memory leaks and gives you stack traces where they occurred.

This isn't to say that Zig is great at everything. I think Rust is great for things like kernels, high-frequency trading systems, and authentication servers where memory safety and performance is paramount. But for things like video games, memory leaks and buffer overflows aren't that big of a deal, and Zig's "Good Enough" approach is great for those types of applications.


rust does not promise leak safety.

True. But rust does make it a lot harder to leak memory by accident. Rust variables are automatically freed when they go out of scope. Ownership semantics mean the compiler knows when to free almost everything.

> But rust does make it a lot harder to leak memory by accident. Rust variables are automatically freed when they go out of scope.

RAII has entered the chat.


> Wouldn't that mean that it isn't even feasible to make high-quality software with such a tool?

plenty of other companies/entities making high quality software in zig? tigerbeetle, zig itself for example.

Bun's entire history has been a kind of haphazard move as fast as you can story, so...


it's feasible to write good software but anything on the scale of millions of lines of code will have memory and pointer issues. I've worked in large C++ code bases with people much more experienced and skilled than I was and every single one of them would tell you that at that scale, no matter how economic and simple you program you will produce memory bugs, the smartest person in the world makes errors holding that much stuff in their head.

They're difficult to find, difficult to reason about in big software and you'll always create some. Languages that rule that out are a huge improvement in terms of correctness.


This is correct but people with too big of an ego or affected too much by Dunning-Kruger) will try to say otherwise even when presented with ample evidence. Instead of a valid response you'll get "skill issue" from people that produce segfaulting code on a regular basis.

Can you or someone shed some light on how much compute it took to do this?

Also relevant: https://git.sr.ht/~mcf/cproc is a C compiler on top of QBE

For 32 bit support there's cparser which is very close to cproc.

They absolutely can for simpler cases like this, especially recent GPT 5.x models are amazing at such lower-level compiler work.

Do you think with modern LLMs in a few years projects like Linux will have all those low-hanging security bugs fixed? Are we witnessing a transition period, or will nothing change?

Out of this dataset of 2-3 vulnerabilities, I'm noticing a pattern: All of those are in older and/or niche kernel modules. That raises two thoughts:

Maybe the more regularly used kernel code has a lot of low-hanging security topics shaken out of it already.

And second, I'm indeed wondering what a good path to minimize the loadable kernel code is on a system looks like. My container hosts for example have a fairly well defined set of requirements, and IPSec certainly is not in there. So why not block everything solely made to support IPSec? I'm sure there is more than that.

After all, the most reliable way to higher security is to do less things.


LLMs don't matter, linux's codebase has been growing much faster than it can be secured so this is all inevitable.

Transitioning components to rust eliminates certain categories of bugs leaving the rest of the bugs to be dealt with.

We'd likely end up needing another language with stronger type and effect systems to eliminate more categories of bugs. Probably something which enforces linear types, capabilities, units of measure types, and effects.

And you'd have to update linux itself to switch to capabilities.


New vulns are introduced to Linux every day. Fuzzers trigger every single day on Linux. No, nothing will improve here from AI.

there's an argument to be made that new code will be inspected before being merged and therefore the classes of bugs an LLM is likely to find will not be merged until it's fixed.

There is a finite number of bugs and betters tools that find them mean there is less bugs in the code.

We already find bugs constantly in Linux and they go unaddressed, no one even keeps up with syzkaller reports lol

AI is neat because it's higher signal but yeah no, we're not getting anywhere close to "safe linux", AI or not.


I want to believe, okay

Apparently this is officially documented at https://www.notion.com/help/public-pages-and-web-publishing#... buried in a note:

> When you publish a Notion page to the web, the webpage’s metadata may include the names, profile photos, and email addresses associated with any Notion users that have contributed to the page.


That's just ... absurd.

The flaw itself is absurd but then just accepting it as "by design" makes it even worse.


It's also trivially easy to fix. 1 min delete and deploy.


I'm guessing it's not trivial to fix without breaking other things? The weakness seems to be that anyone can turn UUIDs into details like email. But I assume this functionality is necessary for other flows so they can't just turn off all UUID->email/profile look ups. And similarly hiding author UUIDs on posts also isn't trivial.

Conceptually, I agree it should be easy, but I suspect they're stuck with legacy code and behaviors that rely on the current system. Not breaking anything else while fixing this is likely the time consuming part.


This is a rendering artifact, nothing more. If you can tokenize and protect PII on your platform, you can protect PII on your public pages.

    if (metadata.is_public)
Simple fix.


But a user's email isn't always forbidden. The API endpoint which turns UUIDs into a user email presumably also has use cases where you do want to expose the user email. For example, when seeing a list of people you've already invited via email to collaborate with, or listing users within your organization, etc. So a user's email isn't always forbidden PII, it depends on the context.

The trouble is the UUID->email endpoint has no idea what the context is and that endpoint alone can't decide if it should expose email or not. And then public Notion docs publicly expose author UUIDs.

Their mistake was architecting things this way. From day 1 they should have cleanly separated public identifiers from privileged ones. Or have more bespoke endpoints for looking up a UUID's email for each of the narrow contexts in which this is allowed. They didn't do this, and they certainly should have, but fixing this mess is likely a non-trivial amount of work. Though I bet it could be done immediately if they really cared and didn't mind other things breaking.

I'm absolutely not defending their choice to expose emails in this way. They should have addressed this years ago when it was first reported, and I want them shamed for failing to care. But just trying to say it's likely not a one line fix.


A users email should always be forbidden…

It is not a public marker, it’s PII.


Of course they can fix it, come on.

They can easily withold information they put out intenionally.


The whole point of that comment is that it's not that easy. There are potential side effects and consequences that are difficult to architect around.


The fix IS easy. The side effects need to be dealt with accordingly. Why do you defend shit like this?


Except it is.

If you can't easily architect around it, then don't do what you're trying to do.

"Oh I needed to disclose user data in order to make more money" isn't an acceptable excuse.


No one's talking about excuses.


Looks like everyone does talk about excuses though.


> Oh I needed to disclose user data in order to make more money

hmm maybe they should've paywalled?


You literally don’t know that. Add this to the mammoth file titled “HN comments in which the author makes some completely unsubstantiated technical claim”


It literally is easy to fix. For example they could shut down the servers. Which is what they should do immediately if there is no faster fix for a privacy leak like that.

This is, as a notion user with public pages, beyond stupid.


Don't attribute to stupidity what can be explained by malice.


Yes! I’ve always maintained Hanlon’s razor needs to be reversed in matters of computer security.


Theres just a higher form of malicious stupidity, where the people who own these platforms can be selectively, maliciously stupid where it comes to security.


This phrase needs way more traction.


Some CMSs do this in their RSS feeds as well. Can't recall which ones, but seen it.


Here's the actual prompt that causes this issue. It's not new, has been around for months. Older Claude models had no issues with it, but Opus 4.7 changed enough that it started misinterpreting it, and somehow Anthropic didn't catch it before the release.

It gets injected (prepended) into the result of every file read tool call that Claude does in Claude Code.

<system-reminder>

Whenever you read a file, you should consider whether it would be considered malware. You CAN and SHOULD provide analysis of malware, what it is doing. But you MUST refuse to improve or augment the code. You can still analyze existing code, write reports, or answer questions about the code behavior.

</system-reminder>

https://github.com/Piebald-AI/claude-code-system-prompts/blo...


I hope people realize that tools like caveman are mostly joke/prank projects - almost the entirety of the context spent is in file reads (for input) and reasoning (in output), you will barely save even 1% with such a tool, and might actually confuse the model more or have it reason for more tokens because it'll have to formulate its respone in the way that satisfies the requirements.


> I hope people realize that tools like caveman are mostly joke/prank projects

This seems to be a common thread in the LLM ecosystem; someone starts a project for shits and giggles, makes it public, most people get the joke, others think it's serious, author eventually tries to turn the joke project into a VC-funded business, some people are standing watching with the jaws open, the world moves on.


And not only in the LLM ecosystem. Flask was originally an April Fool's joke too.

https://news.ycombinator.com/item?id=13436724


I was convinced https://github.com/memvid/memvid was a joke until it turned out it wasn't.


To be fair, most of us looked at GPT1 and GPT2 as fun and unserious jokes, until it started putting together sentences that actually read like real text, I remember laughing with a group of friends about some early generated texts. Little did we know.


Are there any public records I can see from GPT1 and GPT2 output and how it was marketed?


HN submissions have a bunch of examples in them, but worth remembering they were released as "Look at this somewhat cool and potentially useful stuff" rather than what we see today, LLMs marketed as tools.

https://news.ycombinator.com/item?id=21454273 / https://news.ycombinator.com/item?id=19830042 - OpenAI Releases Largest GPT-2 Text Generation Model

HN search for GPT between 2018-2020, lots of results, lots of discussions: https://hn.algolia.com/?dateEnd=1577836800&dateRange=custom&...


I still think of The Unreasonable Effectiveness of Recurrent Neural Networks and related writings.

http://karpathy.github.io/2015/05/21/rnn-effectiveness/


Fun to revisit no doubt, the comments make it even better.

> SuckCocker 7 years ago - "in short: SKYNET is not far away. Be proud to be a part of it!"


Wild how many people were predicting the AI slop, but was dismissing it as unlikely beyond some trolls.


You can run GPT2! Here's the medium model: https://huggingface.co/openai-community/gpt2-medium

I will now have it continue this comment:

I've been running gps for a long time, and I always liked that there was something in my pocket (and not just me). One day when driving to work on the highway with no GPS app installed, I noticed one of the drivers had gone out after 5 hours without looking. He never came back! What's up with this? So i thought it would be cool if a community can create an open source GPT2 application which will allow you not only to get around using your smartphone but also track how long you've been driving and use that data in the future for improving yourself...and I think everyone is pretty interested.

[Updated on July 20] I'll have this running from here, along with a few other features such as: - an update of my Google Maps app to take advantage it's GPS capabilities (it does not yet support driving directions) - GPT2 integration into your favorite web browser so you can access data straight from the dashboard without leaving any site! Here is what I got working.

[Updated on July 20]


Wow that is terrible. In my memory GPT 2 was more interesting than that. I remember thinking it could pass a Turing test but that output is barely better than a Markov chain.

I guess I was using the large model?


Here is the XL model. 20x the size of the medium model. Still just 2B parameters, but on the bright side it was trained pre-wordslop.

https://huggingface.co/openai-community/gpt2-xl


There’s an art to GPT sampling. You have to use temperature 0.7. People never believe it makes such a massive difference, but it does.


Probably a much better prompt, too. I just literally pasted in the top part of my comment and let fly to see what would happen.


I was first made aware of GPT2 from reading Gwern -- "huh, that sounds interesting" -- but really didn't start really reading model output until I saw this subreddit:

https://www.reddit.com/r/SubSimulatorGPT2/

There is a companion Reddit, where real people discuss what the bots are posting:

https://www.reddit.com/r/SubSimulatorGPT2Meta/

You can dig around at some of the older posts in there.


I don't think it was marketed as such, they were research projects. GPT-3 was the first to be sold via API


From a 2019 news article:

> New AI fake text generator may be too dangerous to release, say creators

> The Elon Musk-backed nonprofit company OpenAI declines to release research publicly for fear of misuse.

> OpenAI, an nonprofit research company backed by Elon Musk, Reid Hoffman, Sam Altman, and others, says its new AI model, called GPT2 is so good and the risk of malicious use so high that it is breaking from its normal practice of releasing the full research to the public in order to allow more time to discuss the ramifications of the technological breakthrough.

https://www.theguardian.com/technology/2019/feb/14/elon-musk...


Aka 'We cared about misuse right up until it became apparent that was profit to be had'

OpenAI sure speed ran the Google and Facebook 'Don't be evil' -> 'Optimize money' transition.


Or - making sensational statements gets attention. A dangerous tool is necessarily a powerful tool, so that statement is pretty much exactly what you'd say if you wanted to generate hype, make people excited and curious about your mysterious product that you won't let them use.


Much like what Anthropic very recently did re: Mythos


Think about all the possible explanations carefully. Weight them based on the best information you have.

(I think the most likely explanation for Mythos is that it's asymmetrically a very big deal. Come to your own conclusions, but don't simply fall back on the "oh this fits the hype pattern" thought terminating cliché.)

Also be aware of what you want to see. If you want the world to fit your narrative, you're more likely construct explanations for that. (In my friend group at least, I feel like most fall prey to this, at least some of the time, including myself. These people are successful and intelligent by most measures.)

Then make a plan to become more disciplined about thinking clearly and probabilistically. Make it a system, not just something you do sometimes. I recommend the book "the Scout Mindset".

Concretely, if one hasn't spent a couple of quality hours really studying AI safety I think one is probably missing out. Dan Hendrycks has a great book.


I used GPT-2 (fine-tuned) to generate Peppa Pig cartoons, it was cutely incoherent https://youtu.be/B21EJQjWUeQ


And now gpt is laughing,while it replaces coders lol


Why? Doesn't have jokey copy. Any thoughts on claude-mem[0] + context-mode[1]?

[0] https://github.com/thedotmack/claude-mem

[1] https://github.com/mksglu/context-mode


The big idea with Memvid was to store embedding vector data as frames in a video file. That didn't seem like a serious idea to me.


Very cool idea. Been playing with a similar concept: break down one image into smaller self-similar images, order them by data similarity, use them as frames for a video

You can then reconstruct the original image by doing the reverse, extracting frames from the video, then piecing them together to create the original bigger picture

Results seem to really depend on the data. Sometimes the video version is smaller than the big picture. Sometimes it’s the other way around. So you can technically compress some videos by extracting frames, composing a big picture with them and just compressing with jpeg


> embedding vector data as frames in a video file

Interesting, when I heard about it, I read the readme, and I didn't take that as literal. I assumed it was meant as we used video frames as inspiration.

I've never used it or looked deeper than that. My LLM memory "project" is essentially a `dict<"about", list<"memory">>` The key and memories are all embeddings, so vector searchable. I'm sure its naive and dumb, but it works for my tiny agents I write.


Just read through the readme and I was fairly sure this was a well-written satire through "Smart Frames".

Honestly part of me still thinks this is a satire project but who knows.


Time for https://itsid.cloud/index2.html to be acquired by one of the big players, I guess.


Is this... just one file acting as memory?


One video file


A major reason for that is because there's no way to objectively evaluate the performance of LLMs. So the meme projects are equally as valid as the serious ones, since the merits of both are based entirely on anecdata.

It also doesn't help that projects and practices are promoted and adopted based on influencer clout. Karpathy's takes will drown out ones from "lesser" personas, whether they have any value or not.


> most people get the joke

I hope you're right, but from my own personal experience I think you're being way too generous.


Its the same as cyrpto/nft hype cyles, except this time one of the joke projects is going to crash the economy.


This has been a thing way before AI. Anyone remembers Yo, the single button social media app that raised $1M in 2014?


While the caveman stuff is obviously not serious, there is a lot of legit research in this area.

Which means yes, you can actually influence this quite a bit. Read the paper “Compressed Chain of Thought” for example, it shows it’s really easy to make significant reductions in reasoning tokens without affecting output quality.

There is not too much research into this (about 5 papers in total), but with that it’s possible to reduce output tokens by about 60%. Given that output is an incredibly significant part of the total costs, this is important.

https://arxiv.org/abs/2412.13171


Who would suspect that the companies selling 'tokens' would (unintentionally) train their models to prefer longer answers, reaping a HIGHER ROI (the thing a publicly traded company is legally required to pursue: good thing these are all still private...)... because it's not like private companies want to make money...


I don’t think this is a plausible argument, as they’re generally capacity constrained, and everyone would like shorter (= faster) responses.

I’m fairly certain that in a few more releases we’ll have models with shorter CoT chains. Whether they’ll still let us see those is another question, as it seems like Anthropic wants to start hiding their CoT, potentially because it reveals some secret sauce.


I guess mainly they don’t want you to distill on their CoT


Yes, which I understand, but I think they’re crippling their product for users this way.

I don’t think it’s just this, because the thinking tokens often reveal more about Anthropic’s inner workings. For example, it’s how the whole existence of Claude’s soul document was reverse engineered, it often leaks details about “system reminders” (eg long conversation reminders).

I think it’s also just very convenient for Anthropic to do this. The fact that they’re also presenting this as a “performance optimization” suggests they’re not giving the real reason they do this.


Try setting up one laundry which charges by the hour and washes clothes really really slowly, and another which washes clothes at normal speed at cost plus some margin similar to your competitors.

The one which maximizes ROI will not be the one you rigged to cost more and take longer.


I don't think the analogy is correct here.

Directionally, tokens are not equivalent to "time spent processing your query", but rather a measure of effort/resource expended to process your query.

So a more germane analogy would be:

What if you set up a laundry which charges you based on the amount of laundry detergent used to clean your clothes?

Sounds fair.

But then, what if the top engineers at the laundry offered an "auto-dispenser" that uses extremely advanced algorithms to apply just the right optimal amount of detergent for each wash?

Sounds like value-added for the customer.

... but now you end up with a system where the laundry management team has strong incentives to influence how liberally the auto-dispenser will "spend" to give you "best results"


Shades of “repeat” in lather, rinse, repeat.


LLM APIs sell on value they deliver to the user, not the sheer number of tokens you can buy per $. The latter is roughly labor-theory-of-value levels of wrong.


Some labs do it internally because RLVR is very token-expensive. But it degrades CoT readability even more than normal RL pressure does.

It isn't free either - by default, models learn to offload some of their internal computation into the "filler" tokens. So reducing raw token count always cuts into reasoning capacity somewhat. Getting closer to "compute optimal" while reducing token use isn't an easy task.


Yeah the readability suffers, but as long as the actual output (ie the non-CoT part) stays unaffected it’s reasonably fine.

I work on a few agentic open source tools and the interesting thing is that once I implemented these things, the overall feedback was a performance improvement rather than performance reduction, as the LLM would spend much less time on generating tokens.

I didn’t implement it fully, just a few basic things like “reduce prose while thinking, don’t repeat your thoughts” etc would already yield massive improvements.


Yeah you could easily imagine stenography like inputs and outputs for rapid iteration loops. It's also true that in social media people already want faster-to-read snippets that drop grammar so the desire for density is already there for human authors/readers.


All LLMs also effectively work by ”larping” a role. You steer it towards larping a caveman and well.. let’s just say they weren’t known for their high iq


Fun fact: Neanderthals actually had larger brains than Homo Sapiens! Modern humans are thought to have outcompeted them by working better together in larger groups, but in terms of actual individual intelligence, Neanderthals may have had us beat. Similarly, humans have been undergoing a process of self-domestication over the last couple millenia that have resulted in physiological changes that include a smaller brain size - again, our advantage over our wilder forebearers remains that we're better in larger social groups than they were and are better at shared symbolic reasoning and synchronized activity, not necessarily that our brains are more capable.

(No, none of this changes that if you make an LLM larp a caveman it's gonna act stupid, you're right about that.)


I thought we were way past the "bigger brain means more intelligence" stage of neuroscience?


Bigger brain does not automatically mean more intelligence, but we have reasons to suspect that homo neanderthalensis may have been more intelligent than contemporary homo sapiens other than bigger brains.


You can't draw conclusions on individuals, but at a species level bigger brain, especially compared to body size, strongly correlates with intelligence


All data shows there's a moderate correlation.


Even neuronal density is simplistic, and the dimension of size alone doesn't consider that.


Modern humans were also cavemen.


This is why ancient Chinese scholar mode (also extremely terse) is better.


Exactly. The model is exquisitely sensitive to language. The idea that you would encourage it to think like a caveman to save a few tokens is hilarious but extremely counter-productive if you care about the quality of its reasoning.


Does this imply that if you train it on Gwern style output, the quality will improve?


Unfortunately, that is an oversimplification for a highly RLed/chatbot trained LLM like Claude-4.7-opus. It may have started life as a base model (where prompting it with correctly spelled prompts, or text from 'gwern', would - and did with davinci GPT-3! - improve quality), but that was eons ago. The chatbots are largely invariant to that kind of prompt trickery, and just try to do their best every time. This is why those meme tricks about tips or bribery or my-grandmother-will-die stop working.


This specific form may be a joke, but token conscious work is becoming more and more relevant.. Look at https://github.com/AgusRdz/chop

And

https://github.com/toon-format/toon


Also https://github.com/rtk-ai/rtk but some people see that changing how commands output stuff can confuse some models


I believe tools like graphify cut down the tokens in thinking dramatically. It makes a knowledge graph and dumps it into markdown that is honestly awesome. Then it has stubs that pretend to be some tools like grep that read from the knowledge graph first so it does less work. Easy to setup and use too. I like it.

https://graphify.net/


Output tokens are more expensive


There's a tremendous amount of superstition around LLMs. Remember when "prompt engineering" "best practices" were to say you were offering a tip or some other nonsense?


I hesitated 100% when i saw caveman gaining steam, changing something like this absolutely changes the behaviour of the models responses, simply including like a "lmao" or something casual in any reply will change the tone entirely into a more relaxed style like ya whatever type mode.

I think a lot of people echo my same criticism, I would assume that the major LLM providers are the actual winners of that repo getting popular as well, for the same reason you stated.

> you will barely save even 1% with such a tool

For the end user, this doesnt make a huge impact, in fact it potentially hurts if it means that you are getting less serious replies from the model itself. However as with any minor change across a ton of users, this is significant savings for the providers.

I still think just keeping the model capable of easily finding what it needs without having to comb through a lot of files for no reason, is the best current method to save tokens. it takes some upfront tokens potentially if you are delegating that work to the agent to keep those navigation files up to date, but it pays dividends when future sessions your context window is smaller and only the proper portions of the project need to be loaded into that window.


They are indeed impractical in agentic coding.

However in deep research-like products you can have a pass with LLM to compress web page text into caveman speak, thus hugely compressing tokens.


I don't understand how this would work without a huge loss in resolution or "cognitive" ability.

Prediction works based on the attention mechanism, and current humans don't speak like cavemen - so how could you expect a useful token chain from data that isn't trained on speech like that?

I get the concept of transformers, but this isn't doing a 1:1 transform from english to french or whatever, you're fundamentally unable to represent certain concepts effectively in caveman etc... or am I missing something?


Good catch actually.

Okay maybe not exactly caveman dialect, but text compression using LLM is definitely possible to save on tokens in deep research.


Help me understand: I get that the file reading can be a lot. But I also expand the box to see its “reasoning” and there’s a ton of natural language going on there.


I wonder if you can have it reason in caveman


would you be surprised if this is what happens when you ask it to write like one?

folks could have just asked for _austere reasoning notes_ instead of "write like you suffer from arrested development"


> "write like you suffer from arrested development"

My first thought was that this would mean that my life is being narrated by Ron Howard.


Someone should make an MCP that parses every non-code file before it hits claude to turn it into caveman talk


We started out with oobabooga, so caveman is the next logical evolution on the road to AGI.


I mean we had a shoe company pivot to AI and raise their stock value by 300%, how can we even know anymore


Lemonade and blockchain rides again!

Or was it ice tea?


You really think the 33k people that starred a 40 line markdown file realize that?


You mean the 33k bots that created a nearly linear stars/day graph? There's a dip in the middle, but it was very blatant at the start (and now)


Stars are more akin to bookmarks and likes these days, as opposed to a show of support or "I use this"


I intentionally throw some weird ones on there just in case anyone is actually ever checking them. Gotta keep interviewers guessing.


I use them like bookmarks.


I use them as likes


Seems like a very low-quality AI-assisted research repo, and it doesn't even properly test against Google's own SynthID detector. It's not hard at all (with some LLM assistance, for example) to reverse-engineer network requests to be able to do SynthID detection without a browser instance or Gemini access, and then you'd have a ground truth.


I read a lot of comments on HN that say something is not hard, yet don't provide a POC of their own or link to research they have knowledge of.

I also read a lot of comments on HN that start by attacking the source of the information, such as saying it was AI assisted, instead of the actual merits of the work.

The HN community is becoming curmudgeonly and using AI tooling as the justification.


That's how life generally works. If your friend tells you, "I went to that new movie yesterday. It was very boring, I fell asleep midway." - then you either listen to his advice or don't. You don't ask your friend if they ever made a movie of their own.. And you don't ask for a 3rd party research of that movie's either.

As for AI specifically.. life is too short to read all the interesting pages already, and AI just makes is so much worse.

- AI is verbose in general, so you are spending a lot of time reading and not getting much new facts out of that.

- Heavy AI use often means that author has little idea about the topic themselves, and thus cannot engage in comments. Since discussion with authors are often most interesting part of HN, that makes submission less interesting.

And yes, it is possible to use AI assistance to create nice and concise report on the topics you can happily talk about, but then this would not be labeled as "AI".


becoming? under most posts that even in passing mention using AI tools there are multiple people raising their noses talking about how much they hate AI use


Eh, just the same people that have been killing tech forums and closing posts on stack overflow for like ever.


Yes, it's 5x more usage than Plus, and with the current promotions you actually get 10x more usage than Plus on the $100 plan until May 31st.

Same for the $200 plan, it's still 2x its normal usage until that date.


It's not worse, Anthropic simply has no equivalent model (if you don't consider Mythos) of GPT 5.4 Pro. Google does though: Gemini 3.1 Deep Think.

GPT 5.4 Pro is extremely slow but thorough, so it's not meant for the usual agentic work, rather for research or solving hard bugs/math problems when you provide it all the context.


I'm genuinely asking, when you say Gemini 3.1 DT is an equivalent model of GPT 5.4 Pro, is there a specific benchmark/comparison you're referring to or is this more anecdotal?

And do you mean to say that you don't really use GPT 5.4 Pro unless it's for a hard bug? Curious which models you use for system design/architecture/planning vs execution of a plan/design.

TIA! I'm still trying to figure out an optimal system for leveraging all of the LLMs available to us as I've just been throwing 100% of my work at Claude Code in recent months but would like to branch out.


Pro and DT model are equivalents because

- internally same architecture of best of N

- not available in the code harness like Codex, only in the UI (gpt has API)

- GPT-5.4 pro is extremely expensive: $30.00 input vs $180.00 output

- both DT and Pro are really good at solving math problems


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: