You appear to be mixing two things here. What you described is a form of a "slab allocator" whereby all blocks in a certain size range are allocated from one big piece of memory that is sliced into equally sized chunks. This is really fast, it works great for smaller blocks and it is the way to speed up the STL-heavy code. I did some profiling a while back and a lock-free slab allocator delivered 10x speed up in the code that operated with std::map and strings.
The second thing is locality. An allocator like halloc could theoretically utilize the fact that two blocks are explictly marked as related and allocate them close to each other. However, for this to work the allocation function needs to be passed a pointer to the parent block, which is not the case with halloc. So the API needs to change, the implementation needs to change too, and then it will no longer be a light implementation of a simple idea, but something else.
The second thing is locality. An allocator like halloc could theoretically utilize the fact that two blocks are explictly marked as related and allocate them close to each other. However, for this to work the allocation function needs to be passed a pointer to the parent block, which is not the case with halloc. So the API needs to change, the implementation needs to change too, and then it will no longer be a light implementation of a simple idea, but something else.