They have 11 IIS servers but interestingly he says this:
What do we need to run Stack Overflow? That hasn’t changed much since 2013, but due to the optimizations and new hardware mentioned above, we’re down to needing only 1 web server. We have unintentionally tested this, successfully, a few times. To be clear: I’m saying it works. I’m not saying it’s a good idea. It’s fun though, every time.
Interesting to consider how some companies doing much less work on their machines than SO need clusters of hundreds of servers, meanwhile they can serve the 57th site by global traffic (according to Alexa) from one physical machine .
Sure there is other stuff backing that one, but the next time you hear someone talk about big clusters and hundreds or thousands of nodes, just take a moment to appreciate how much can be done with one rack of gear these days.
We don't really cache that much, for example: every post, comment, user, etc. on the page is pulled and rendered live. We output cache certain pages for 60 seconds for anonymous users, but it has almost no performance impact (I think it's a 4% hit rate?)
Some things are cached, but each page render is very dynamic - more so than most I think? I don't have a great source of comparison for similar sites though.
Nick, first of all great write-up, it's really nice when people take time from their schedule to write such detailed informations about how their internal systems work!
I could be wrong but from the article and the numbers you posted it seems that you do cache 89% of your db queries results so maybe this is what jldugger referred to :
504,816,843 (+170,244,740) SQL Queries (from HTTP requests alone)
5,831,683,114 (+5,418,818,063) Redis hits
Ah I see the confusion now. To clarify: we use redis for a great many things, not just serving HTTP requests. For example, we use redis lists and sets to continually recalculate mobile feeds for users - that's roughly 4 billion of the hits every day.
I interpret this as: Page render latency probably degrades, but the site doesn't go down. Slower loads certainly have a monetary cost, making 11 machines worthwhile, but in a pinch at least they won't go fully offline if 10 die.
What do we need to run Stack Overflow? That hasn’t changed much since 2013, but due to the optimizations and new hardware mentioned above, we’re down to needing only 1 web server. We have unintentionally tested this, successfully, a few times. To be clear: I’m saying it works. I’m not saying it’s a good idea. It’s fun though, every time.