Grouping digits in SQL

JoelJacobson · on Sept 21, 2023

I'm truly impressed by the tactfulness of first making the previously unintentionally legal but undocumented syntax an error in PostgreSQL 15. This is a textbook case of "Virtuous Intolerance" [1].

By taking this step, the PostgreSQL community set up an invaluable real-world litmus test that lasted an entire year. This strategic intolerance allowed them to gather critical insights into whether any existing code would break and to understand whether such cases were intentional or not.

The decision significantly de-risked the introduction of the new, much-welcomed syntax in SQL and PostgreSQL 16. I think this decision framework serves as a model for other projects on how to introduce changes responsibly without breaking the ecosystem.

Kudos to the PostgreSQL team for such thoughtful engineering and to Peter Eisentraut for leading this exemplary four-year journey!

[1] The Harmful Consequences of the Robustness Principle: https://www.ietf.org/archive/id/draft-iab-protocol-maintenan...

noduerme · on Sept 21, 2023

"Move slow and try not to break stuff" would be a nice antidote to the last 20 years. I hope we all go that way.

Speaking of DB changes, I wish someone had even warned me what kind of crazy nearly impossible to debug chaos could erupt from pushdown optimization when I had to migrate a huge codebase to mysql 8.

derekperkins · on Sept 23, 2023

Very interested in the details

imrehg · on Sept 21, 2023

Here's a more up-to-date link to [1] as it is now published as RFC-9413: https://www.rfc-editor.org/rfc/rfc9413.html#name-virtuous-in...

JoelJacobson · on Sept 21, 2023

Thanks for the link!

hans_castorp · on Sept 21, 2023

I typically use 1e8 instead 100000000 when writing large (integer) numbers

fbdab103 · on Sept 22, 2023

In Python that returns a float.

    In [1]: 1e8
    Out[1]: 100000000.0
    In [2]: type(1e8)
    Out[2]: float

marginalia_nu · on Sept 21, 2023

Right, but what if you want to express something that does not end in a bunch of zeroes?

NeoTar · on Sept 21, 2023

I imagine there are cases where you want a large number not expressible in base-10 exponential notation. The obvious examples are the upper limits of integers - 2_147_483_647 and 4_294_967_295.

thaumasiotes · on Sept 21, 2023

Usually you'd want to express those as 0x7fff_ffff or 0xffff_ffff.

oever · on Sept 21, 2023

Adding syntax to improve readability of numbers is counterproductive. Improve the software the you use to read and edit the files. The extra syntax makes assumptions on what is easier to read.

fbdab103 · on Sept 21, 2023

  For example, this was accepted:
   SELECT 123abc;
  This would parse the same as
   SELECT 123 abc;

SQL is just not fair.

zoky · on Sept 21, 2023

To be fair, that’s not part of the SQL standard, it was just an unintended quirk in PostgreSQL resulting from a series of historical choices. It actually kind of makes sense how that happened; once the requirement to use AS was dropped, it became necessary to find other ways to delimit parameters in a statement, and anything beginning with a number would be delimited by the first non-numeric character. So if the parser sees a token that starts with a digit, it reads until the first non-numeric digit, strips out any following whitespace, and starts the next token at the next character. Probably nobody ever thought to test for this behavior because it’s not in the SQL standard to specifically reject such input.

rob74 · on Sept 21, 2023

What's a bit weird is that you can leave out the whitespace between the tokens. I'm not familiar with PostgreSQL, but I'm pretty sure "selectcount(*)fromtable" is not valid (although the parser could conceivably say "Ok, I have seen the token select, whatever comes after it must be the next token" - this would slow down parsing, but is theoretically possible)?

taway1237 · on Sept 21, 2023

Parsing and tokenisation are usually (at least conceptually) separate steps. Your example almost certainly tokenizes as selectcount, (, *, ), fromtable. It's plausible "select 123abc" would tokenize as select, 123, abc because of a code quirk.

paul-vernon · on Sept 21, 2023

select"this statement is valid""Postgres"from"table"

chasil · on Sept 21, 2023

This appears to come from Oracle. It is present in the final release of 8i.

  SQL> select version from v$instance;

  VERSION
  -----------------
  8.1.7.4.0

  SQL> select 123abc from dual;

         ABC
  ----------
         123