It's all feels though. If you stare into the void, the apocalypse is coming. OTOH, bringing an AI assistant to every person in the world to make their lives better, is one perspective to take. It's all a matter of framing.
When Claude says "Shall I push it", it's way easier to just respond "yes" than it is to open a new terminal and run git push, and if you're being graded on how much AI tokens you use, saying yes looks even better for your metrics!
That's some of how geolocation works. Ping can't go faster than the speed of light, so that gives you a circle for where something is. Ping from enough places and you can get a good enough idea, if you're the Iranian Guard or otherwise.
I got all the way to round 53, but it turned out that one of my semiaquatic tetrapod ancestors from the Carboniferous Period didn't perform on land as well as they would have liked, so that was it for me.
On this side of the equation I think you start pulling in customer context and risk analysis on the downside. What is the churn risk for operation at 99% vs 99.9% availability.
If your site is for B2B and impacts customers own operations or revenue, you'll likely be wanting to chase the 99.9%, customers won't tolerate the 1.5 hours per week of downtime and will churn.
However, if the value you're site creates is tolerant to those sorts of disruptions, someone is just inconvenienced and can come back later, a large investment to move from 99% to 99.9% wouldn't be justified. There is literally no impact from the investment. The harder part will be the reality, most investments will be somewhere in the middle with ambiguity on the impact. IIRC, SRE principles do talk about this when setting SLOs in different terms.
I've heard some companies refer to the concept as economical thinking, which is I think a great way to think about it. Doesn't mean you'll always get it right, more so that we embed being conscious about the ROI in our work.
I also believe this is an area that I've observed several engineers really struggle with, especially when moving from big tech to startups, where it's really easy to import culture from another company, and in earlier stages of startup life... if you don't have product-market-fit, it doesn't matter how good you're availability is. Attention is a resource, make sure it's allocated to what creates value for the customer.
Depending if the site has a direct competitor and non-sticky customers, you can often get accurate loss estimates from outages. For example, friends of mine at Doordash would know when UberEats was down by the corresponding spike in traffic to their app. The competitor captures all the lost traffic.
Most enterprises will have a harder time quantifying losses, as some percentage of customers will come back later. To understand that, you need to look for a drop in completed purchase rates compared to site visits.
For a SaaS, it's even more difficult, as customers are often held captive by long contracts and might tolerate SLA breaches up to a certain point. A reasonable, though fictional, proxy would be the revenue for the contract pro-rated against the uptime during that period.
reply