I'm truly impressed by the tactfulness of first making the previously unintentionally legal but undocumented syntax an error in PostgreSQL 15. This is a textbook case of "Virtuous Intolerance" [1].
By taking this step, the PostgreSQL community set up an invaluable real-world litmus test that lasted an entire year. This strategic intolerance allowed them to gather critical insights into whether any existing code would break and to understand whether such cases were intentional or not.
The decision significantly de-risked the introduction of the new, much-welcomed syntax in SQL and PostgreSQL 16. I think this decision framework serves as a model for other projects on how to introduce changes responsibly without breaking the ecosystem.
Kudos to the PostgreSQL team for such thoughtful engineering and to Peter Eisentraut for leading this exemplary four-year journey!
"Move slow and try not to break stuff" would be a nice antidote to the last 20 years. I hope we all go that way.
Speaking of DB changes, I wish someone had even warned me what kind of crazy nearly impossible to debug chaos could erupt from pushdown optimization when I had to migrate a huge codebase to mysql 8.
I imagine there are cases where you want a large number not expressible in base-10 exponential notation. The obvious examples are the upper limits of integers - 2_147_483_647 and 4_294_967_295.
Adding syntax to improve readability of numbers is counterproductive. Improve the software the you use to read and edit the files. The extra syntax makes assumptions on what is easier to read.
To be fair, that’s not part of the SQL standard, it was just an unintended quirk in PostgreSQL resulting from a series of historical choices. It actually kind of makes sense how that happened; once the requirement to use AS was dropped, it became necessary to find other ways to delimit parameters in a statement, and anything beginning with a number would be delimited by the first non-numeric character. So if the parser sees a token that starts with a digit, it reads until the first non-numeric digit, strips out any following whitespace, and starts the next token at the next character. Probably nobody ever thought to test for this behavior because it’s not in the SQL standard to specifically reject such input.
What's a bit weird is that you can leave out the whitespace between the tokens. I'm not familiar with PostgreSQL, but I'm pretty sure "selectcount(*)fromtable" is not valid (although the parser could conceivably say "Ok, I have seen the token select, whatever comes after it must be the next token" - this would slow down parsing, but is theoretically possible)?
Parsing and tokenisation are usually (at least conceptually) separate steps. Your example almost certainly tokenizes as selectcount, (, *, ), fromtable. It's plausible "select 123abc" would tokenize as select, 123, abc because of a code quirk.
By taking this step, the PostgreSQL community set up an invaluable real-world litmus test that lasted an entire year. This strategic intolerance allowed them to gather critical insights into whether any existing code would break and to understand whether such cases were intentional or not.
The decision significantly de-risked the introduction of the new, much-welcomed syntax in SQL and PostgreSQL 16. I think this decision framework serves as a model for other projects on how to introduce changes responsibly without breaking the ecosystem.
Kudos to the PostgreSQL team for such thoughtful engineering and to Peter Eisentraut for leading this exemplary four-year journey!
[1] The Harmful Consequences of the Robustness Principle: https://www.ietf.org/archive/id/draft-iab-protocol-maintenan...