[Tue, 10 Oct 2017] – US data center outage – why it was triggered and the measures taken thereafter

[Tue, 10 Oct 2017] – US data center outage – why it was triggered and the measures taken thereafter

We’d like to shed some light on the US data center outage that we all experienced last week.

Here is a detailed rundown of what happened, which will hopefully answer your questions with regard to the situation.

The outage was caused by an accidental Emergency Power-Off (EPO) shutdown triggered during a routine on-premise safety system maintenance. The power supply to the racks was cut off in consequence, which rendered the UPS systems incapable of taking over. This sudden backupless power cut resulted in a network equipment failure, which elongated the outage lifecycle. Even though the power supply was restored shortly afterwards, our master DNS server remained ‘in the dark’ in the absence of network connectivity.

Due to the fact that the NS1/NS2 name servers were made inoperable, all US data center-bound requests were redirected to the backup name servers in the UK and Finland, as per our initial DNS redundancy provision plan. This, however, resulted in an enormous load, which rendered them incapable of serving any traffic. This explains the inaccessibility of the websites hosted in the other data centers during the US data center outage, even though the hosting servers themselves were completely operational.

While the US data center staff has been investigating the EPO shutdown incident, we, in turn, have taken all necessary measures to resolve the network unavailability and DNS failover problems and to prevent them from occurring ever again.

We’d like to thank you for your patience and understanding and to assure you that we’re doing everything we can to ensure a much better long-term performance for your sites.