Random UUID's are super useful when you have distributed creation of UUID's, bec...

sgarland · on July 6, 2024

> Postgres is happier with sequence ID's, but keeping Postgres happy isn't the only design goal.

It literally is the one thing in the entire stack that must always be happy. Every stateful service likely depends on it. Sad DBs means higher latency for everyone, and grumpy DBREs getting paged.

matharmin · on July 6, 2024

Postgres is usually completely happy enough with UUIDv4. Overall architecture (such as allowing distributed id generation, if relevant) is more important than squeezing out that last bit of performance, especially for the majority of web applications who don't work with 10 million+ rows.

sgarland · on July 6, 2024

If your app isn’t working with billions of rows, you really don’t need to be worrying about distributed anything. Even then, I’d be suspicious.

I don’t think people grasp how far a single RDBMS server can take you. Hundreds of thousands of queries per second are well in reach of a well-configured MySQL or Postgres instance on modern hardware. This also has the terrific benefit of making reasoning about state and transactions much, much simpler.

Re: last bit of performance, it’s more than that. If you’re using Aurora, where you pay for every disk op, using UUIDv4 as PK in Postgres will approximately 7x your IOPS for SELECTs using them, and massively (I can’t quantify it on a general basis; it depends on the rest of the table, and your workload split) increase them for writes. That’s not free. On RDS, where you pay for disk performance upfront, you’re cutting into your available performance.

About the only place it effectively doesn’t matter except at insane scale is on native NVMe drives. If you saturate IOPS for one of those without first saturating the NIC, I would love to see your schema and queries.

fovc · on July 6, 2024

Scale isn’t the only reason to have distributed systems. You could very well have a tiny but distributed system

wibblewobble125 · on July 6, 2024

Sometimes distribution is not for performance but tenant isolation for regulatory or general isolation purposes. I work in such an industry.

sgarland · on July 6, 2024

Fair point. You can still use monotonic IDs with these, via either interleaving chunks to each DB, or with a central server that allocates them – the latter approach is how Slack handles it, for example.

SoftTalker · on July 6, 2024

DBRE? I guess DBA is too old fashioned for the cool kids?

sgarland · on July 6, 2024

Listen, I didn't make the title up, I just grabbed onto it from the SRE world because I love databases.

There are some pragmatic differences I've found, though - generally, DBAs are less focused on things like IaC (though I know at least one who does), SLIs/SLOs, CI/CD, and the other things often associated with SRE. So DBRE is SRE + DBA, or a DB-focused SRE, if you'd rather.

Spivak · on July 6, 2024

> Random UUID's are super useful when you have distributed creation of UUID's, because you avoid conflicts with very high probability and don't rely on your DB to generate them for you

See Snowflake IDs for a scheme that gives you the benefit of random UUIDs but are strictly increasing. Which is really UUIDv7 but fits in your bigint column. No entropy required.

globular-toast · on July 6, 2024

The whole point of UUID is distributed creation. There's nothing about random ones (UUIDv4) that makes it better for this purpose.