People have always relied on 3rd parties to provide services for them, and this is especially true in the technology sector. Think server providers, payment providers, image hosting, CSS & JS libraries, CDNs etc. The list is endless. Using external providers is of course fine, why re-invent the wheel after all? You should be concentrating on what makes your product/service unique, not already-solved problems. (It’s also the Linux ethos!)
Why should you care?
With that said, relying on other people’s services is obviously a problem if their service isn’t up. Luckily, most big service providers, and lots of smaller providers too, have status pages where they will provide the current status for their systems and services. (see ours here). These status pages are great during an unforeseen outage, as you can get the latest info, such as when they are expecting the issue to be fixed, without having to contact their support team with questions, at a time when their support is probably under a lot of strain due to the outage in question.
Lots of status pages even allow you to subscribe to updates, meaning you’ll receive an email or SMS (or even have them call a web-hook for an integration into your alerting systems) when there is an issue.
As much as everyone hates outages, they are unfortunately a part of life, and when it’s another service provider’s outage, there isn’t much you can do. (Ideally you should never have a single point of failure, i.e. high availability, but that is a blog post for another time).
What can you do about it?
However, not all outages are unforeseen, and lots of common issues are easy to prevent ahead of time with some simple steps:
- Monitor the status page / blogs of your service providers for warnings of future work that could effect you, and make a record of it
- Subscribe to any relevant mailing lists. These not only let you know about issues, but allow you to take part in a discussion around the issue and it’s effects
- Set up your own checks for service providers that don’t have a status page and/or an automated reminder system (we can help with this).
- Make sure that reminder notifications are actually being seen, not just received. You could have all of the warning time in the world, but if nobody reads the notification, you can’t action anything.
Other things to consider
As mentioned above, your customers are likely to be more forgiving of your outage if it is somebody else’s fault, but they’re not gonna be happy if it’s your fault, and they are really not gonna be happy if it was easily preventable.
The two most common problems that fall into this bracket are domain name and SSL certificate renewals. Every website needs a domain name, and massive amounts of sites use SSL in at least some areas. If your domain name expires, your site could become unavailable immediately (depending on your domain registrar and how nice they are).
SSL certificate expiries can also cause your site to become unavailable immediately. On top of this, browsers will give nasty warnings about the site being insecure. This is likely to stick in the mind of some visitors, meaning it could damage your traffic and/or reputation even after the initial issue has been resolved. It’s also really easy to set up checks for these two things yourselves.