Hacker Newsnew | past | comments | ask | show | jobs | submit | EnPissant's commentslogin

Now try applying this logic to elevators.

> 1. SWE-bench Verified is now saturated at 93.9% (congrats Anthropic), but anyone who hasn't reached that number yet still has more room for growth.

But if some or all players are bench-maxing it, then it becomes a much less useful metric for comparison.

Also, this doesn't address what OpenAI says about the test suite disallowing valid solutions.


Streaming weights from RAM to GPU for prefill makes sense due to batching and pcie5 x16 is fast enough to make it worthwhile.

Streaming weights from RAM to GPU for decode makes no sense at all because batching requires multiple parallel streams.

Streaming weights from SSD _never_ makes sense because the delta between SSD and RAM is too large. There is no situation where you would not be able to fit a model in RAM and also have useful speeds from SSD.


There have been some very interesting experiments with streaming from SSD recently: https://simonwillison.net/2026/Mar/18/llm-in-a-flash/

I don't mean to be a jerk, but 2-bit quant, reducing experts from 10 to 4, who knows if the test is running long enough for the SSD to thermal throttle, and still only getting 5.5 tokens/s does not sound useful to me.

It's a lot more useful than being entirely unable to try out the model.

But you aren't trying out the model. You quantized beyond what people generally say is acceptable, and reduced the number of experts, which these models are not designed for.

Even worse, the github repo advertises:

> Pure C/Metal inference engine that runs Qwen3.5-397B-A17B (a 397 billion parameter Mixture-of-Experts model) on a MacBook Pro with 48GB RAM at 4.4+ tokens/second with production-quality output including tool calling.

Hiding the fact that active params is _not_ 17B.


It doesn't have to be a 2-bit quant - see the update at the bottom of my post:

> Update: Dan's latest version upgrades to 4-bit quantization of the experts (209GB on disk, 4.36 tokens/second) after finding that the 2-bit version broke tool calling while 4-bit handles that well.

That was also just the first version of this pattern that I encountered, it's since seen a bunch of additional activity from other developers in other projects.

I linked to some of those in this follow-up: https://simonwillison.net/2026/Mar/24/streaming-experts/


On Apple Silicon Macs, the RAM is shared. So while maybe not up to raw GPU VRAM speeds, it still manages over 450GB/s real world on M4 Pro/Max series, to any place that it is needed.

They all do have a limitation from the SSD, but the Apple SSDs can do over 17GB/s (on high end models, the more normal ones are around 8GB/s)


Yeah, I am mostly only talking about the SSD bottleneck being too slow. No way Apple gets 17GB/s sustained. SSDs thermally throttle really fast, and you have some random access involved when it needs the next expert.

Those are quants, not distills.

This should not be the top comment on every model release post. It's getting tiring.

This should be the bottom comment on the pelican comment on every model release post.

Clearly the top comment should be "Imagine a beowulf cluster of Deepseek v4!"

My mother was murdered by Beowulf, you insensitive Claude!

This was perfect.

Are nvfp4 / mxfp4 even useful without QAT?

There are multiple factors at play:

- Among those who believe in intellectual differences among human groups, very few believe Europeans are the most intelligent group. The prevailing opinion you would find is that both Ashkenazi Jews and East Asians (your second example) are more intelligent on average.

- Northwest Europeans encountered intense selection over a Millennia starting around 300AD through a couple of mechanisms: The Church banning cousin marriage, and Biparte Manorialism. This resulted in the destruction of kinship networks, established the nuclear family, and selected for the high-trust peoples that enabled a kind of society you can still only find among those peoples.


You're not even responding to the question. You're describing (what you believe to be) features of global civilization today, projected out from "300AD". But at various points over that interval, European civilization wasn't on the leaderboard, and was being outcompeted by the Khmer, Mali, China, you name it.

You see this all the time in these kinds of discussions, the assumption that because "western" civilization possesses X, Y, and Z traits, history must consist of a linear progression towards realizing those traits. Obviously, no. For many centuries the west was brutal, illiterate, tribal, and chaotic, primitive in ways other cultures were not.

It's just a tedious history lesson except that it abruptly falsifies the idea that you can look at "civilizational achievement" and reason back to genetic superiority. Obviously you cannot. You could come up with some other evidence for genetic superiority! But this particular argument is patently wrong.


The notion that you can only find a high-trust society among Europeans looks like transparent bunk to me, easily refuted by looking at other highly developed countries. And the notion that this might be a matter of genetics even more so. Sadly, we won't be able to use CRISPR therapy as a way of improving social trust anytime soon!

Which other developed countries do you mean? The only ones I can think of, have westernised on purpose. E.g. Singapore and Japan.

What you call "westernised" is just describing the adoption of bourgeois and open market norms. There's nothing about these norms that's inherent to what we call the West: classical Western culture (Greece and Rome, but the attitude persisted well into the middle ages and ultimately fed into multiple streams of modern-era thought) similar to other ancient societies, actively despised market participants, broadly equating them with swindlers.

That is sort of my point, I can't think of a developed country that hasn't westernised to some extent.

And yet when you look at corruption perception indexes, they largely track the Hajnal line. As do so many other graphs.

Ireland and Eastern Europe are outside the Hajnal line, and yet they're one of the biggest growth success stories. I suppose we'll get our answer soon enough as to whether it really matters. Do note however that a late marriage age for females (the key finding of the Hajnal line) implies that they have to be enabled to self-support via work, which was a key step towards modern bourgeois norms and was also inherently correlated to general prosperity.

> He characterized Stanford and MIT as “mainly political lobbying operations fighting American innovation at this point” and vowed that universities would “pay the price” after “they declared war on 70% of the country.”

Oh? He vowed what? To make them pay the price? Or did he just predict a cause and effect and The Nation (your source) is libeling him?


From their home page:

> Why?

> Commutation

> In Pijul, independent changes can be applied in any order without changing the result or the version's identifier. This makes Pijul significantly simpler than workflows using git rebase or hg transplant. Pijul has a branch-like feature called "channels", but these are not as important as in other systems. For example, so-called feature branches are often just changes in Pijul. Keeping your history clean is the default.

This is a useless property because the graph is only encoded at the patch layer. In the real world you have far more semantic dependencies than patch dependencies. ie, I have a patch that adds a function that calls a function added in another patch. Pijul doesn't know about that.

> Merge correctness

> Pijul guarantees a number of strong properties on merges. The most important one is that the order between lines is always preserved. This is unlike 3-way merge, which may sometimes shuffle lines around. When the order is unknown (for example in the case of concurrent edits), this is a conflict, which contrasts with systems with "automatic" or "no conflicts" merges.

I can't remember being bitten by this, and you don't need Pijul to solve this. A merge algorithm that leverages `git blame` information would work just as well. It's just nobody cares enough to use such a thing.

> First-class conflicts

> In Pijul, conflicts are not modelled as a "failure to merge", but rather as the standard case. Specifically, conflicts happen between two changes, and are solved by one change. The resolution change solves the conflict between the same two changes, no matter if other changes have been made concurrently. Once solved, conflicts never come back.

Conflicts coming back is not an issue in git. For some reason people think they need to use rebase when they should almost always be using merge.

> Partial clones

> Commutation makes it possible to clone only a small subset of a repository: indeed, one can only apply the changes related to that subset. Working on a partial clone produces changes that can readily be sent to the large repository.

Git and other snapshot-based SCMs do this far far better. Git can checkout only a set of files or directories, and the tree-structure encoded in git objects in its db makes this very efficient. You could even build a fuse layer to lazily fetch content. With Pijul you would have to extremely carefully maintain your history to allow this. ie, when you have a patch that modifies 2 other patches, then those are merged forever if you need the changes in the merger. Imagine a PR that reformatted all files in the repo or changes a top level interface and fixed all users in the same PR. Whoops, everything is now interdependent, no more partial clones.


Git's patching functionality is pretty awful in actual practice. For example, try building a git patch that applies a standard set of patches between two divergent (vendored) kernel trees. It's usually a mess.

It's also pretty easy to find a set of patches that have to be applied in order. Then someone copies those patches onto another divergent tree, which has it's own set of custom patches without renaming. This is dangerous in git and probably sensible in pijul.

Haven't use pijul in practice, but it's not hard to imagine being better than git here.


I think patching/cherry-picking is just inherently complicated and needs intelligence applied. I don't think Pijul is going to be any better here.


I remember working with darcs 20 years ago (pijul should be on that lineage) and cherry picking in that was way better than doing it with git since it meant "get this change and all the required ones" rather than "pick this commit".

It still required intelligence (changes across files may not be tracked as dependent but actually are) but it was a different experience from what git provides.


If you really wanted, you could implement this on top of a snapshot based system by mixing in git blame information (or an algorithm that is similar). It's not hard to compute that text based dependency graph on the fly, though maybe expensive without caching.


Cherry picking is just convenience for something that anyone could do manually. If it didn't exist in the VCS, people would do it anyway and make tools to do it.

Fossil's implementation is the best, since a cherry-picked commit always points back to its origin.


That is kind of the point of Pijul, first class support for "how do I combine these" which git mostly discards as unimportant.

But for a lot of work at scale (or people) mixing bits is important.


Speaking as a former Darcs user (Darcs is another patch-based VCS that Pijul draws inspiration from):

"This is a useless property because the graph is only encoded at the patch layer. In the real world you have far more semantic dependencies than patch dependencies. ie, I have a patch that adds a function that calls a function added in another patch. Pijul doesn't know about that."

Darcs solved this in two different ways.

1. Dependencies. While Darcs patches would inherently "depend" on the last patch which affected the same lines, the committer could specify other patches as dependencies, handling exactly the case you described.

2. Practicality. In reality, there's no scenario where someone is pulling your "use function X" patches and also not pulling the "define function X" patch. They could if they really want to, but this would be a lot of effort for a deliberately bad result. It would be like, in Git, cherry-picking the "use" patches without the "define" patches. In neither system would this happen by accident.

"Conflicts coming back is not an issue in git. For some reason people think they need to use rebase when they should almost always be using merge."

There's a big difference between "conflicts shouldn't come back as long as everyone does what I want" and "conflicts don't come back". As long as you're using Git with other people, the rebase-lovers and their problems will be a perpetual issue. I've been on 3 teams in a row with this problem.

I deliberately moved away from Darcs after a few years - the benefit of snapshot VCS is that you don't just have the change, but you have the whole context in which the change happened. (Also branch discovery in Darcs basically doesn't exist; Pijul fixed this, at least!) I love Fossil and Mercurial for their adherence to accurate history.


What I want is a system that records how conflicts are resolved and tries to apply that resolve. Lets say I have apply patch A, then patch B, there is a conflict, I resolve it. Then someone else applies patch B and then patch A. The VCS should know that this conflict is already resolve and apply the solution. Likewise when applying first patch C, then A and B, it should let you resolve the conflict from AB to ABC and again record the resolved conflict for the future. I'm actually fine with manually resolving conflicts, but don't want to it twice if I don't have to. This would be a great way to organize something like dwm, but I couldn't get it to work with pijul at all.


I believe this is what `git rerere` does


> 1. Dependencies. While Darcs patches would inherently "depend" on the last patch which affected the same lines, the committer could specify other patches as dependencies, handling exactly the case you described.

I don't know who would want to put in the work of mapping out the semantic dependency graph because maybe some day someone might want to compose a slightly different set of patches. And even if everyone tried, it would surely still fail because that's so hard or impossible.

> There's a big difference between "conflicts shouldn't come back as long as everyone does what I want" and "conflicts don't come back". As long as you're using Git with other people, the rebase-lovers and their problems will be a perpetual issue. I've been on 3 teams in a row with this problem.

Just stop using rebase is much easier to socialize than let's all move to Pijul. It's also the correct thing to do.

> the benefit of snapshot VCS is that you don't just have the change, but you have the whole context in which the change happened.

I strongly agree with this and think it's the only tractable way to operate.


For patch sets/commutation, I find their system appealing. I think it’s tempting to yearn for something better still (eg darcs had a kind of sed-like patch that could apply its search to patches that are merged with it). If you look at how big companies do development, code review, CI, etc into monorepos, this is typically done with diff-focused thinking. Having the VC system and associated metadata attached to the individual patches (or sets thereof) feels like it could be an improvement over rebases or complex merge structures.


> Git can checkout only a set of files or directories

How do you do this? With submodules / subtrees?


git sparse-checkout



Even if you believe the "we don't train on your data" claim/lie, that leaves a whole lot of things they can do with it besides training directly on it.

Analytics can be run on it, they can run it through their own models, synthetic training data can be derived from it, it can be used to build profiles on you/your business, they could harvest trade/literal secrets from it, they could store derivatives of your data to one day sell to competitors/compete themselves, they can use it to gauge just how dependent you've made yourself/business on their LLMs and price accordingly, etc.


No. Your data or any derivative of it does not leave RAM unless you are detected as doing something that qualifies as abuse, then it is retained for 30 days.


Even the process of deciding what "qualifies as abuse" does what I'm talking about: they're analyzing your data with their own models and doing whatever they want with the results, including storing it and using it to ban you from the product you paid for, and call the police on you.

Either way, I don't believe it.


You are a Star Wars Rebel fighting Darth Vader. Good job!


Thanks


That's about the API. It doesn't say anything about their other products like Codex. Moreover, even in the API it says you have to qualify for zero retention policies. They retain the data for however long each jurisdiction requires data retention & they are always improving their abuse detection using the retained data.

> Our use of content. We may use Content to provide, maintain, develop, and improve our Services, comply with applicable law, enforce our terms and policies, and keep our Services safe. If you're using ChatGPT through Apple's integrations, see this Help Center article (opens in a new window) for how we handle your Content.

> Opt out. If you do not want us to use your Content to train our models, you can opt out by following the instructions in this article . Please note that in some cases this may limit the ability of our Services to better address your specific use case.

https://openai.com/policies/row-terms-of-use/ https://openai.com/policies/how-your-data-is-used-to-improve...


Codex just talks to the responses API with store=false. So unless the model detects you are doing something that qualifies as abuse, nothing is retained.


Alright, good luck to you. I'm not really interested in talking to people who think they're lawyers for AI providers. If you think they don't keep any of the data & don't use it for training then you are welcome to continue believing that. It makes no difference to me either way.


> Alright, good luck to you. I'm not really interested in talking to people who think they're lawyers for AI providers.

Codex is open source, you can inspect it yourself, but let's not let facts ruin your David vs Goliath fantasy.


And you believe them?


Yes. That's the rational position.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: