Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Random UUID's are super useful when you have distributed creation of UUID's, because you avoid conflicts with very high probability and don't rely on your DB to generate them for you, and they also leak no information about when or where the UUID was created.

Postgres is happier with sequence ID's, but keeping Postgres happy isn't the only design goal. It does well enough for all practical purposes if you need randomness.



> Postgres is happier with sequence ID's, but keeping Postgres happy isn't the only design goal.

It literally is the one thing in the entire stack that must always be happy. Every stateful service likely depends on it. Sad DBs means higher latency for everyone, and grumpy DBREs getting paged.


Postgres is usually completely happy enough with UUIDv4. Overall architecture (such as allowing distributed id generation, if relevant) is more important than squeezing out that last bit of performance, especially for the majority of web applications who don't work with 10 million+ rows.


If your app isn’t working with billions of rows, you really don’t need to be worrying about distributed anything. Even then, I’d be suspicious.

I don’t think people grasp how far a single RDBMS server can take you. Hundreds of thousands of queries per second are well in reach of a well-configured MySQL or Postgres instance on modern hardware. This also has the terrific benefit of making reasoning about state and transactions much, much simpler.

Re: last bit of performance, it’s more than that. If you’re using Aurora, where you pay for every disk op, using UUIDv4 as PK in Postgres will approximately 7x your IOPS for SELECTs using them, and massively (I can’t quantify it on a general basis; it depends on the rest of the table, and your workload split) increase them for writes. That’s not free. On RDS, where you pay for disk performance upfront, you’re cutting into your available performance.

About the only place it effectively doesn’t matter except at insane scale is on native NVMe drives. If you saturate IOPS for one of those without first saturating the NIC, I would love to see your schema and queries.


Scale isn’t the only reason to have distributed systems. You could very well have a tiny but distributed system


Sometimes distribution is not for performance but tenant isolation for regulatory or general isolation purposes. I work in such an industry.


Fair point. You can still use monotonic IDs with these, via either interleaving chunks to each DB, or with a central server that allocates them – the latter approach is how Slack handles it, for example.


DBRE? I guess DBA is too old fashioned for the cool kids?


Listen, I didn't make the title up, I just grabbed onto it from the SRE world because I love databases.

There are some pragmatic differences I've found, though - generally, DBAs are less focused on things like IaC (though I know at least one who does), SLIs/SLOs, CI/CD, and the other things often associated with SRE. So DBRE is SRE + DBA, or a DB-focused SRE, if you'd rather.


> Random UUID's are super useful when you have distributed creation of UUID's, because you avoid conflicts with very high probability and don't rely on your DB to generate them for you

See Snowflake IDs for a scheme that gives you the benefit of random UUIDs but are strictly increasing. Which is really UUIDv7 but fits in your bigint column. No entropy required.


The whole point of UUID is distributed creation. There's nothing about random ones (UUIDv4) that makes it better for this purpose.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: