GC burns far more CPU cycles. Meanwhile I'm not sure where you got this idea about the value of CPU cycles relative to RAM. Most tasks stall on IO. Those that don't typically stall on either memory bandwidth or latency. Meanwhile CPU bound tasks typically don't perform allocations and if forced avoid the heap like the plague.
Far less for moving collectors. That's why they're used: to reduce the overhead of malloc/free based memory management. The whole point of moving collectors is that they can make the CPU cost of memory management arbitrarily low, even lower than stack allocation. In practice it's more complicated, but the principle stands.
The reason some programs "avoid the heap like the plague" is because their memory management is CPU-inefficient (as in the case of malloc/free allocators).
> Meanwhile I'm not sure where you got this idea about the value of CPU cycles relative to RAM
There is a fundamental relationship between CPU and RAM. As we learn in basic complexity theory, the power of what can be computed depends on how much memory an algorithm can use. On the flip side, using memory and managing memory requires CPU.
To get the most basic intuition, let's look at an extreme example. Consider a machine with 1 GB of free RAM and two programs that compute the same thing and consume 100% CPU for their duration. One uses 80MB of RAM and runs for 100s; the other uses 800MB of RAM and runs for 99s (perhaps thanks to a moving collector). Which is more efficient? It may seem that we need to compare the value of 1% CPU reduction vs a 10x increase in RAM consumption, but that's not necessary. The second program is more efficient. Why? Because when a program consumes 100% of the CPU, no other program can make use of any RAM, and so both programs effectively capture all 1GB, only the second program captures it for one second less.
This scales even to cases when the CPU consumption is less than 100% CPU, as the important thing to realise is that the two resources are coupled. The thing that needs to be optimised isn't CPU and RAM separately, but the RAM/CPU ratio. A program can be less efficient by using too little RAM if using more RAM can reduce its CPU consumption to get the right ratio (e.g. by using a moving collector) and vice versa.
There are (at least) two glaring issues with your analysis. First, the vast majority of workloads don't block on CPU (as I previously pointed out) and when they do they almost never do heap allocations in the hot path (again, as I previously pointed out). Second, we don't use single core single thread machines these days. Most workloads block on IO or memory access; the CPU pipeline is out of order and we have SMT for precisely this reason.
Anyway I'm not at all inclined to blindly believe your claim that malloc/free is particularly expensive relative to various GC algorithms. At present I believe the opposite (that malloc/free is quite cheap) but I'm open to the possibility that I'm misinformed about that. You're going to need to link to reputable benchmarks if you expect me to accept the efficiency claim, but even then that wouldn't convince me that any extra CPU cycles were actually an issue for the reasons articulated in the preceding paragraph.
> There are (at least) two glaring issues with your analysis. First, the vast majority of workloads don't block on CPU (as I previously pointed out) and when they do they almost never do heap allocations in the hot path (again, as I previously pointed out). Second, we don't use single core single thread machines these days. Most workloads block on IO or memory access; the CPU pipeline is out of order and we have SMT for precisely this reason.
This doesn't matter because if you're running a single program on a machine, it might as well use all the CPU and all the RAM. As long as you're under 100% on both, you're good. But we want to utilise the hardware well because we typically want to run multiple programs (or VMs) on a single machine, and the machine is exhausted when the first of CPU or RAM is exhausted. So the question is how should your CPU and RAM usage be balanced to offer optimal utilisation given that the machine is spent when the first of CPU and RAM is spent. E.g. you can only run two programs, each using 50% of CPU; if they each use only 5% of RAM, you've saved nothing as no third program can run. So if you spend either one of these resources in an unbalanced way, you're not using your hardware optimally. Using 2% more CPU to save 200MB of RAM could be suboptimal.
I'm not saying that for every program that uses X% CPU should also use exactly X% of RAM or it must be wasting one or the other, but that's the general perspective of how to think about efficiency. Using a lot of one and little of the other is, broadly speaking, not very efficient.
> Anyway I'm not at all inclined to blindly believe your claim that malloc/free is particularly expensive relative to various GC algorithms. At present I believe the opposite (that malloc/free is quite cheap) but I'm open to the possibility that I'm misinformed about that.
You are.
> You're going to need to link to reputable benchmarks if you expect me to accept the efficiency claim, but even then that wouldn't convince me that any extra CPU cycles were actually an issue for the reasons articulated in the preceding paragraph.
I don't believe there are any reputable benchmarks of full applications (which is where memory-management matters) that are apples-to-apples. I'm speaking from over two decades of experience with C++ and Java.
The important property of moving collectors is that they give you a knob that allows you to turn RAM into CPU and vice-versa (to some extent), and that's what you want to achieve the efficient balance.
Moving collectors as generally used are a huge waste of memory throughput, and this shows up consistently in the performance measurements. Moving data is very expensive! The whole point of ownership tracking in programming languages is so that large chunks of "owned" data can just stay put until freed, and only the owning handle (which is tiny) needs to move around. Most GC programming languages do a terrible job of supporting that pattern.
That's just not true. To give you a few pieces of the picture, moving collectors move little memory and do so rarely (relative to the allocation rate):
In the young generation, few objects survive and so few are moved (the very few that survive longer are moved into the old gen); in the old generation, most objects survive, but the allocation rate is so low that moving them is rare (although the memory management technique in the old gen doesn't matter as much precisely because the allocation rate is so low, so whether you want a moving algorithm or not in the old gen is less about speed and more about other concerns).
On top of that, the general principle of moving collectors (and why in theory they're cheaper than stack allocation) is that the cost of the overall work of moving memory is roughly constant for a specific workload, but its frequency can be made as low as you want by using more RAM.
The reason moving collectors are used in the first place is to reduce the high overhead of malloc/free allocators.
Anyway, the general point I was making above is that a machine is exhausted not when both CPU and RAM are exhausted, but when one of them is. Efficient hardware utilisation is when the program strikes some good balance between them. There's not much point to reducing RAM footprint when CPU utilisation is high or reducing CPU consumption when RAM consumption is high. Using much of one and little of the other is wasteful when you can reduce the higher one by increasing the other. Moving collectors give you a convenient knob to do that: if a program consumes a lot of CPU and little RAM, you can increase the heap and turn some RAM into CPU and vice versa.
Whatever little CPU they waste is often worth more than the RAM they save.
> For cases where they are we've got stuff like arena allocators.
... that work by using more RAM to save on CPU.