How’s Snowflake doing? One thing these database companies have going for themsel...

RedShift1 · on March 13, 2021

I've heard this line many times but is this actually true? Are there stories or statistics of these things?

drunkpotato · on March 13, 2021

I've done several database migrations, and it's true, but not because the data is hard to extract. It's moderately easy to dump out CSV or JSON files to S3 and load them back in to another database (only moderately because you have to deal with fiddly encoding issues). So in that sense the data is easily extracted. Tools like AWS Data Migration Services, while not perfect, can also make this a lot easier.

However, at least three issues make a database migration non-trivial:

1. Custom data types that have to be replicated somehow in the new database; this usually involves figuring out how to parse the output representation in a sensible way.

2. User-defined triggers and functions, or custom database features, that may not be available on the new database.

3. There is usually a lot of infrastructure built on top of a database that can be hard to switch over, like if you've used roles substantially in Postgres for access control. Not to mention any business intelligence tool built on it that tends to have a lot of hand-rolled SQL. For example, Periscope can be database-agnostic, but any queries you write might be customized for one particular database vendor.

So in my view, it's not really data concerns (except case 1); there are a lot of operational concerns that make database migrations hard.

steven_pack · on March 13, 2021

4. Egress costs from the clouds are non-trivial for large datasets. So now you're locked into both your db and your cloud.

hibikir · on March 13, 2021

Even the migrations that appear small end up being huge. Over a decade ago, I was working for a large biotech company, which at the time was having trouble with their Oracle installation. The technical team, being very conservative, and having never worked anywhere else before, decided that the easiest way out was to purchase a large Exadata machine, as that was the less risky proposition: It's still supposed to be the same Oracle they knew and loved, except faster, and in far bigger iron than they were running before. So they ignored the scary price tag, believing that they were going to save a mint in recreating their data store.

Well, things weren't this easy. It was Oracle alright, so all the queries, triggers and views still worked... except the performance characteristics were completely different, because Exadata does all kinds of interesting things to extract more performance. So all the hints, manual query plans? just downright wrong, and often slower than before. In the end, easily a third of the queries needed rewrites, plus all the internal tuning changes that the very large DBA team did to make queries perform well. The whole effort cost many thousands of man months, on top of the oracle hardware, and the never cheap oracle license.

So even a migration from Oracle to a different flavor of Oracle can end up freezing a department with hundreds of tech workers for a year, and this was considered to be the cheapest choice! Imagine how much fun this would be if it was a large lift that, say, went from a RDBMS to NoSQL.

zxspectrum1982 · on March 14, 2021

I concur. I was project manager for a migration from Microsoft SQL Server to Azure SQL. There were all sorts of problems, performance being one of them.

jimbokun · on March 13, 2021

My team built our product on top of a very capable NoSQL database called MarkLogic. It had issues, but also had a lot of things it did well.

But then the original licensing agreement we had negotiated was coming to an end, and the new deal was going to be massively more expensive.

So we used this as an opportunity to rearchitect our monolith into services built on other open source data stores, that also fixed a lot of the architectural issues and technical debt we had acquired.

So to answer your question, yes it’s possible to move off of a database, if the financial incentives are great enough.

greggyb · on March 13, 2021

Database refactoring tend to measure in millions of dollars and quarters-years from a consulting perspective.

Some context: this is for expansive legacy application databases that have many years of tenure or large data warehouse applications. If you've ever worked in the enterprise data space, you'll be tripping over this sort of workload left and right.

yjftsjthsd-h · on March 13, 2021

One anecdote: Consider how long it took Amazon to get off of Oracle. They have all the engineering talent, they have a huge budget, they had all the political will, and took ages still for them to migrate.

assface · on March 13, 2021

There is a reason why IBM still makes a lot of money on IMS.

dominotw · on March 13, 2021

Its not difficult at all. We just moved bunch of usecases from snowflake to pinot. All I had to do was s3 export from snowflake and import into pinot.