As a tech company, we’re constantly chasing this thing called uptime, what is it really?
First off, I think that HUGE credit needs to go to SUSE for creating this parody music video. It’s not only VERY well done, but as a techie (or geek) it spoke to me and is effectively the basis of this article.
Speaking of uptime as a commodity, you’d immediately think that uptime was something tangible that you could pick up and put in your pocket… well, it almost is.
When a tech company, providing a mission critical service, advertises itself , you’ll hear them talking about 99.9999999999999999999% uptime – I remember when NetDynamix (in one of the first sales meetings we had) was presenting to a potential client and they asked us how many 9’s we would guarantee them. Truth be told that I actually had NO idea what they were talking about but I quickly retorted something along the lines of, “we prefer to focus on getting the job done, keeping you online as much as possible and focus on the 9’s only if we’re dropping the ball – which is something that’s not in our vocabulary”
(I was sweating bullets as one normally does when they’re improvising)
So what is 99.99999%, what is uptime and why does it matter so much?
Basically when we speak of uptime, we’re talking about the amount of time (referred as a percentage) in a month that our servers HAVE BEEN or WILL BE up and online and providing you the service which you pay us for.
It’s kind of the same confidence that you want when pulling into a petrol station expecting them to have fuel so you can fill your car up.
So if you were a courier company and I told you that the petrol station around the corner from your office could only guarantee that they’d have fuel for you 75% of the month, you probably wouldn’t bank on them as a place to refuel your fleet of trucks. However… 99% sounds just fine, right?
The answer to that question would probably be yes, but there’s plenty more 9’s that we can fit behind the missing decimal point…
When 99% just isn’t good enough!!!
I’m being serious. Sometimes 99% isnt good enough.
I should probably start off by saying that uptime excludes any planned service outages such as planned maintenance or regular scheduled maintenance windows. So, if every Monday your provider has scheduled to do maintenance between 01:00am and 03:00am, this 2 hour window does not affect their uptime.
Regular/Scheduled outages aside. Consider this:
- 99% uptime – often referred to as “two nines”
allows for a total of 14.4 minutes a day
or 1.68 hours a week, 7.2 hours a monthwhereas;
- 99.9999999% uptime – often referred to as “nine nines”
allows for a total of 31.55 MILLISECONDS of downtime A YEAR
Now I don’t know how you feel, but as an business who relies heavily on multiple systems to get my work done and knowing my luck… Murphy would ensure that the 14.4 minutes of downtime would happen to me when I most needed my email or internet access. In fact, Murphy would ensure that my internet was down for 14.4 minutes and then my email directly after that for another 14.4 minutes.
I’m digressing. Just because the provider is offering 99% – that doesn’t mean that every day we should expect to be down for 14.4 minutes.
Being realistic seems like a better option
I think back to when I spun the story to get out of answering the “how many 9’s will you give us” question.
Surely it’s better to strive for the best possible uptime rather than guaranteeing something unrealistic?
Yes and no. If I look at NetDynamix, because that’s the only real data I have, our various servers over the last 90 days have had an uptime of 100%
However, in the same period of time, VARIOUS client’s have been affected with internet and power outages due to many reasons out of our control, from things such as copper theft to bills not being paid, etc.
Essentially what I’m saying here is that uptime gaurantees are usually very focused on the part of the service a tech company delivers. Let’s look at the petrol station analogy again.
If I park my car at home and take a walk to the petrol station expecting them to fill my car up, I can’t very well be upset with them for being unable to do so since I left my car at home.
Similarly, if our services are up and all links and networks that get the service from us to you are up, but your office electricity is out due to cable theft, we are somewhat exonerated in this instance. 90%* of the time we don’t get blamed for problems outside the scope of our service. Techies, you’ve all experienced the 10% of the time when you get blamed for the internet being down when there’s a power outage? (am I right?)
So where do you place the pegs when dealing with your uptime guarantee?
Redundancy, Redundancy, Redundancy…. Redundancy!!!
We strive for 99.99% uptime at all times, which allows us a total of just under 5 minutes of unscheduled downtime a month and we’re proud that we meet this goal 99.999%* of the time. The big secret of course is that at this stage of the game, redundancy and proactive monitoring are words that drive our network design and server/service implementation because redundant systems are essential in order to guarantee your clients only the best.
I know that many providers operating in our space usually don’t even own or manage their infrastructure, let alone having any redundant infrastructure sitting around waiting to be brought online when disaster hits.
How do you deal with uptime guarantees? Share your comments with us below.
* PUN INTENDED