> If pointers are magic O(1) values Um... pointers *are* O(1) size. Usually eith...

SamReidHughes · on March 25, 2015

>Um... pointers are O(1) size. Usually either a fixed 32 or 64 bits.

If you want to talk about asymptotic growth of an algorithm, you need it defined on a machine model that supports infinite memory. Saying that pointers are special O(1) size values are a reasonable way to do that. Another way is to let them be variable size, and include that cost in your measurements. With your 64-bit pointers, you're overpaying that cost. And the transition from 32 bit pointers to 64 bit pointers is an object lesson on the real existence of this logarithmic factor that you'd like to pretend doesn't exist.

Edit: also, going with O(1) pointers, the expected asymptotic space usage doesn't mean anything without specifying the probability distribution of dictionaries involved (maybe parameterized on the size n). It's not hard to construct one that changes the outcome.

LukeShu · on March 25, 2015

Ok, fair.

It would have been correct for the original author to have written that most collections have ϴ(n) space complexity, where radix trees only have O(n) space complexity.

It's common for people to use O when they mean ϴ or Ω. All the original statement was saying is that most collections use as much data as you put in, but some (like radix tree) can use less.

(for fun, if we don't give O(1) pointers: If I'm not mistaken, radix tree has O(m*log(m)+n), Ω(m) space-complexity, which is O(n) iff the average size of an element is ≥ log(m); which it almost certainly would be)

Veedrac · on March 25, 2015

> which is O(n) iff the average size of an element is ≥ log(m); which it almost certainly would be

That's stronger than a mere "almost certainly". At least some constant fraction (dependent on the alphabet) would be of size Ω(log m) for some base also dependent on the alphabet, so the average size is also Ω(log m).

I'm not sure how you get your best case, though. It seems to me that the best case has at least Θ(m) nodes since there are m contained strings. The best case for this is each being of minimum size - which is constant size. Each node but the first has a pointer to it of size Ω(log m), so the overall minimum cost is Ω(m log m).