Press "Enter" to skip to content

Power Issues Cause Most Data Center Outages, Followed by Cooling

The percentage of outages caused by cooling issues is expected to rise as server density increases.

Dulles Airport during CrowdStrike outage.
Dulles Airport during CrowdStrike outage. | reivax, CC BY-SA 2.0, via Wikimedia Commons

This is the fifth of a six-part series focused on Uptime Institute’s Global Data Center Survey 2024.

Data center outages continue to decline, according to Uptime Institute’s Global Data Center Survey 2024, with data centers reporting an impactful outage during the last three years dropping from 78% to 53% over about a three year period.

In an online briefing on the survey, Uptime’s executive director of research Andy Lawrence said that bit of good news only offset some data center problems that we discussed here yesterday: “I think the industry should not be congratulating itself on sustainability, but in terms of resiliency, I think the data center sector doing is doing a very good job.”

While the number of outages is on the decline, the cost of outages continues to rise, with more than 20% of outages costing more than $1 million. The average cost of an outage is an unknown, and according to Uptime research analyst Dog Donnellan, there’s a reason for that.

“If we were to try to calculate an average cost for an outage it would be very difficult, in some cases perhaps impossible, because there are some outages that are so costly that they would completely skew an average and completely invalidate the figures,” he explained. “I am thinking about the recent CrowdStrike incident that has cost estimated in the $9 billion range.”

Over half of all outages are caused by power issues, which Uptime’s CTO Chris Brown said should surprise no one.

“Power is everything,” he said. “If you think about it, everything in a data center uses power and power is fairly binary– it’s either on or it’s off. The amount of time that you have to respond to a power event before something connected to it notices is a fraction of a second, so there’s a very limited amount of time. And since everything relies on power, power has many more vectors by which to cause your data center problems. Thus power is the the biggest cause, and probably will be the biggest cause for for a long period of time.”

The second largest cause of data center outages is cooling. Although currently responsible for about 13% of outages, far fewer than power, Brown says that we can expect to see cooling issues causing more outages in the future.

“Mechanical is a lot more forgiving,” he said. “There’s more time between a mechanical outage and it impacting IT operations. But as the densities of racks continue to climb, that level of forgiveness is going to decrease and cooling will probably start sharing some larger percentage with outage things driving outages.”

Brown pointed out, however, that more power hungry racks will also create additional infrastructure cost for data centers on the power side, as well as create more failure vectors that could lead to down time.

“We started out 20 years ago with two-in electrical systems, and then we’ve added different things that allowed us to share backup units across multiple power systems — distributed redundancy, catcher systems, those sorts of things,” he said. “All of that requires more intelligent switching controls and active equipment that can fail. As the cost of equipment continues to go up and our densities need to continue to go up, we’re going to have to have more and more innovative and creative solutions to try to manage those capital costs, which is going to add more active equipment, most likely, and add additional threat vectors for power systems.”

Tomorrow: Cloud and Provisioning.

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Breaking News: