Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Database startups suffer from an extreme case of an important truism in startups generally: the product you think you are selling is not the product the customer thinks they are buying. No one buys a database per se, they are just a means to some other end.

People that love database technology — that would be me — tend to start database companies. It is very difficult to sell a database. It is much, much easier to sell a compelling solution to a somewhat boring but very valuable business problem that just happens to require an amazing database capability behind it. That’s your moat, the customer doesn’t actually care that there is an amazing database engine behind it but it makes it difficult for competitors to replicate.

Filed under “lessons I learned the hard way”.



On the other hand, any kind of technology in infrastructure is a prime candidate for open source, the incentives are just very much aligned there.

Infrastructure tech makes for a great case that helps everyone while giving nobody a competitive advantage, that's why companies love doing or sponsoring it.

On the other hand, open source tech in infrastructure helps get rid of vendor lock-in as in the worst case, you could still look inside the source code and patch things — also fancy FOSS tech makes for good community building efforts and shiny job ads, both attracting good devs.

Both effects together explain to me why open source makes for such a fierce competitor in any kind of paid infra tech, it's hard to build the momentum to outrun them.

And open core? Pretty much dead by now since as soon as the cloud providers see there's traction around any technology, they will build a hosted version, either directly or in a protocol compatible fashion.

Techies really dig this stuff (I do too), but as a business model, there aren't many worse ones in 2021.


Some incentives are aligned for open source data infrastructure but others are clearly misaligned. In practice we just get a different type of less-than-great outcome. Open source data infrastructure is not a solved problem.

A question that lacks a satisfactory answer in open source data infrastructure is this: who is going to pay for the initial architecture to be state-of-the-art? Architecture is forever, if you start with a naive design it will put the platform at a long-term disadvantage. Most open source infrastructure projects are started by well-meaning people that have no idea what a state-of-the-art design even looks like. At the periphery you might be able to burn a giant pile of VC money to make it happen but that is not sustainable without a clear path to profitable exit. The set of people with the critical expertise and the set of people building open source data infrastructure are nearly disjoint.

Today, the money is in state-of-the-art bespoke data infrastructure. If the tradeoff is "open source" and "10-100x better on infrastructure KPIs", companies happily pay millions for the latter. People that know how to design this infrastructure are extremely well paid, 7-figures is common and demand is high. Yes, they could take a year off work and create an open source infrastructure project but this doesn't seem to happen and it is easy to understand why, as there are few incentives and many disincentives. This is roughly my area of business, I see the dynamics from the inside and the trend is not favoring open source.

If all of the cutting edge infrastructure development work is happening in closed source, whether cloud or state-of-the-art bespoke, it doesn't bode well for the long-term relevance of open source.


Cloud providers are proprietary infrastructure and their rapid growth shows the opposite.

SQL Server, Oracle just led to AWS DynamoDB/Aurora. You don't have any source code for either. Open-source projects growing in popularity doesn't override the greater trend that companies just want solutions, whether it's open or not.


And if anything, the cloud is an even bigger moat. Which would you rather have to explain to yourself CTO as a cause of outage:

AWS had a problem or “Hosted Super Open Source Database” had a problem.

There is a reason why there is an old adage: “No one got fired for going with (IBM/Microsoft/AWS)”


This is exactly why Oracle in the 90s and 00s starting buying up software Application companies (Peoplesoft, JD Edwards, Hyperion, etc).

And then in the old on-premise world, you sell them both the App and require the use of your database which is an additional sale.

This strategy doesn’t work in a Cloud world though. Since customers are no longer buying the individual components (like the database) but are instead just leasing the finished good. By definition, they don’t care what’s under the hood - unless you’re selling to another software company who’s using your database to make their finished good.


IIRC, Oracle managed to get a huge, well paying customer (DoD) behind their "we build a better database" pitch.

That provided enough funding, users and support to move into applications. In fact even if they didn't expand into other areas, that support could have kept them as a profitable DB company for decades.


Was this in their earliest days? Reminds me of what I read recently about IBM getting a significant contract from government, but even more importantly benefits of research, in early days of computing when they were actually behind. After that they got so far ahead they stayed the leader until the industry changed.


Ten years ago, when I still worked in consulting, all my clients (large institutional companies) used Oracle and they'd say they were willing to pay the premium because Oracle was 20% faster than everyone else.


> ...the product you think you are selling is not the product the customer thinks they are buying.

Your comment reminds me of April Dunford's post on how she helped position a database product: https://www.thefxck.com/interviews/product-positioning-april...


I don’t know if this is a corollary, but when I try to come up with an example of someone getting rich selling databases, I think of companies known more for their sales than their engineering.


If you're thinking of the same company I'm thinking of, I thought it was better known for its legal department than its sales department.


Or for their predatory anticompetitive behaviour?


Beautiful, this is exactly what's been on my mind for a while. After building a ton of "cool" infrastructure technology, I got recommendations from friends to sell it. But in the back of my mind somewhere I had going on what you've now made clear - that people won't actually want to _buy_ that technology, they'd much rather buy the business-specific stuff that is made possible _because_ of that technology.


This the key. I'm also a dreamer of a better RDBMs (https://tablam.org) and wish I could live doing it... but working on the sector of small business what they want is a better access/excel. Probably that is what bigger companies want too.

So the internal tech is just a mean to make that possible. I think if a db engine provide the equivalent of the auto-admin of Django it will sell itself easily :)

However this is also hard because DBs engines touch several complex things (storage, concurrency, compilers, interpreters, etc) and is hard to find people for this plus the funding.


There's a bit of a chicken and egg problem in that until you have compelling products demonstrating how your “better” RDBMS contributes to better business solutions, there's not something the technical people who understand the theoretical advantages can take to the less technical people that need to approve the decision to justify it over established RDBMSs, that are not only seen as more secure from a business perspective but also easier to hire qualified admins, etc., for.


Yeah, is like build a OS or in fact any infrastructure project. The key is have a clear north so the project know what need to nail at first try...


Yes. The spatial database startups I've had experience with, e.g. CartoDB (now Carto) and GeoSpock, transitioned from selling database tech to selling turnkey GIS (Geographical Information Systems) solutions, often with industry specific specialisations.


How’s Snowflake doing?

One thing these database companies have going for themselves, is that once a customer company’s data and code is locked in to a particular database, then it becomes very very hard to get out. Not impossible, but just very difficult.


I've heard this line many times but is this actually true? Are there stories or statistics of these things?


I've done several database migrations, and it's true, but not because the data is hard to extract. It's moderately easy to dump out CSV or JSON files to S3 and load them back in to another database (only moderately because you have to deal with fiddly encoding issues). So in that sense the data is easily extracted. Tools like AWS Data Migration Services, while not perfect, can also make this a lot easier.

However, at least three issues make a database migration non-trivial:

1. Custom data types that have to be replicated somehow in the new database; this usually involves figuring out how to parse the output representation in a sensible way.

2. User-defined triggers and functions, or custom database features, that may not be available on the new database.

3. There is usually a lot of infrastructure built on top of a database that can be hard to switch over, like if you've used roles substantially in Postgres for access control. Not to mention any business intelligence tool built on it that tends to have a lot of hand-rolled SQL. For example, Periscope can be database-agnostic, but any queries you write might be customized for one particular database vendor.

So in my view, it's not really data concerns (except case 1); there are a lot of operational concerns that make database migrations hard.


4. Egress costs from the clouds are non-trivial for large datasets. So now you're locked into both your db and your cloud.


Even the migrations that appear small end up being huge. Over a decade ago, I was working for a large biotech company, which at the time was having trouble with their Oracle installation. The technical team, being very conservative, and having never worked anywhere else before, decided that the easiest way out was to purchase a large Exadata machine, as that was the less risky proposition: It's still supposed to be the same Oracle they knew and loved, except faster, and in far bigger iron than they were running before. So they ignored the scary price tag, believing that they were going to save a mint in recreating their data store.

Well, things weren't this easy. It was Oracle alright, so all the queries, triggers and views still worked... except the performance characteristics were completely different, because Exadata does all kinds of interesting things to extract more performance. So all the hints, manual query plans? just downright wrong, and often slower than before. In the end, easily a third of the queries needed rewrites, plus all the internal tuning changes that the very large DBA team did to make queries perform well. The whole effort cost many thousands of man months, on top of the oracle hardware, and the never cheap oracle license.

So even a migration from Oracle to a different flavor of Oracle can end up freezing a department with hundreds of tech workers for a year, and this was considered to be the cheapest choice! Imagine how much fun this would be if it was a large lift that, say, went from a RDBMS to NoSQL.


I concur. I was project manager for a migration from Microsoft SQL Server to Azure SQL. There were all sorts of problems, performance being one of them.


My team built our product on top of a very capable NoSQL database called MarkLogic. It had issues, but also had a lot of things it did well.

But then the original licensing agreement we had negotiated was coming to an end, and the new deal was going to be massively more expensive.

So we used this as an opportunity to rearchitect our monolith into services built on other open source data stores, that also fixed a lot of the architectural issues and technical debt we had acquired.

So to answer your question, yes it’s possible to move off of a database, if the financial incentives are great enough.


Database refactoring tend to measure in millions of dollars and quarters-years from a consulting perspective.

Some context: this is for expansive legacy application databases that have many years of tenure or large data warehouse applications. If you've ever worked in the enterprise data space, you'll be tripping over this sort of workload left and right.


One anecdote: Consider how long it took Amazon to get off of Oracle. They have all the engineering talent, they have a huge budget, they had all the political will, and took ages still for them to migrate.


There is a reason why IBM still makes a lot of money on IMS.


Its not difficult at all. We just moved bunch of usecases from snowflake to pinot. All I had to do was s3 export from snowflake and import into pinot.


This is basically our play. We took SQLite and wrapped a very compelling business app around it. None of our customers give a single shit that we use SQLite vs SQL Server, aside from passing comments on the IT deep dive calls.

"You don't need us to install any database software?"

"nah we have our own integrated data persistence mechanism".

"Cool. So about those firewall rules..."

That is about the extent to which our customers care. They are more interested in what actually shows up on their display and how stable the solution is than any particular technology choices we decided to make on their behalf.

No one wants to be responsible for picking a database engine. Pick it for your customers. That's part of the engineering.


Out of all the database products out there, I probably trust SQLite the most to do what I ask of it.


Joel Spolsky wrote about this quite a few times, Architecture astronauts take over

"What is it going to take for you to get the message that customers don’t want the things that architecture astronauts just love to build. The people? They love twitter. And flickr and delicious and picasa and tripit and ebay and a million other fun things, which they do want, and this so called synchronization problem is just not an actual problem, it’s a fun programming exercise that you’re doing because it’s just hard enough to be interesting but not so hard that you can’t figure it out."

https://www.joelonsoftware.com/2008/05/01/architecture-astro...


On the one hand I usually love to drop in a bit of spolsky wisdom myself, and that quote stands well on its own, but the article itself is unfortunately marred by the fact the Microsoft products mentioned were just unsuccessful early attempts at iCloud and Google Accounts, both of which have since seen considerable "killer app" level success. I guess easy with hindsight to say "ah yes, but smartphones".


And the return to timesharing systems.


Is that a everything-new-is-old description for cloud offerings?


I once saw a fascinating quote that went something like the following:

"As computing technology develops it becomes more efficient to take centralised computing resources and distribute them closer to the user. As network technology develops it becomes more efficient to centralise them again. Further advances redistribute and yet further advances recentralise. This pattern has been noticed several times in the history of computation."

And the most fascinating thing was that it was from decades ago, perhaps even 1960! I've never been able to find the quote again. Does anyone recognise it? Perhaps I imagined it.


I dont know the exact quoter but the phenomena has been noticed and commented on maaaany times; I remember reading dilbert jokes about it in the 90s.


Can you give examples of "Further advances redistribute"?

Also, I wonder if commodity cloud offerings such as AWS will change this?


Mainframes -> home pcs -> laptop -> smartphone, all advances miniaturizing and distributing computing closer to the edge


Relevant piece of history on timesharing in relation to cloud computing by the legend that is Brian Kernighan:

https://youtu.be/O9upVbGSBFo?t=340 (05:40 - 10:50)


Indeed, with the browser being a replacement for XWindows, RDP and even SSH/Telnet.


You mean the thing that is literally just managed hosting rebranded with new shiny?


> paying untenable salaries to kids with more ultimate frisbee experience than Python, whose main job will be to play foosball in the googleplex and walk around trying to get someone…anyone…to come see the demo code they’ve just written with their “20% time,” doing some kind of, let me guess, cloud-based synchronization… between Microsoft and Google the starting salary for a smart CS grad is inching dangerously close to six figures and these smart kids, the cream of our universities, are working on hopeless and useless architecture astronomy because these companies are like cancers, driven to grow at all cost,




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: