Availability vs. Reliability in Data Center Electrical Components

morena sanidad

By Morena Sanidad, P.E., LEED AP

Reliability and Availability are popular terms in the data center world. They are used interchangeably and often times misinterpreted; while they work together to define the success of a system, they are separate parameters requiring specific understanding. In this blog I will describe the definition of the terms, their applications to system components, and their relation to data centers.

What is Reliability?

Engineering Reliability is associated with the durability or quality of a material, defined as the probability that a given system or component will perform as intended under specified operational and environmental conditions over a given period of time. It is expressed as a percentage that denotes the probability of success/failure over a given period of time.

For example, a UPS system that consistently has an inverter failure, is a system that is not performing its intended function. It has decreased reliability, and could even be considered an unreliable system.

What is Availability?

Availability is associated with fault tolerant systems. The term fault tolerant means a system can remain operational even in the presence of hardware component failures. For example, if the UPS breaks down because of its unreliable inverter, it will not cause an overall system interruption because there is an alternate UPS component that takes over the task. This extra feature in the design increases the component’s reliability allowing some level of fault tolerance, and results in increased availability.

The reliability and availability of a given system or component can be calculated from the Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR) data. The MTBF is the predicted elapsed time before failure during operation. MTTR is the average time needed to respond and repair failed parts; this is a particularly crucial figure. These two parameters are published by the manufacturer and are quoted in hours; this statistical number is derived from historic data. This measurement, of course, doesn’t include “human factor”- related failures.

In theory, you can calculate the reliability and availability of a component with the equations below:

 

• R = e ^ (-t ÷ MTBF)

• A = MTBF ÷ MTBF + MTTR

 

Where:

• R= Reliability

• A = Availability

• e = exponential function (inverse of Lon)

• t = period of time (hours)

• MTBF = Mean Time Before Failure (hours)

• MTTR = Mean Time To Repair (hours)

 

Examples:

• The reliability of an inverter with an MTBF = 100,000 hours in a 5 year (43,800 hours) period is 64.5%. And 83.93% in a 2 year time frame or 91.61% in a 1 year time frame.

 

R= e ^ (-43,800 ÷ 100,000) = 0.645

R= e ^ (-17,520 ÷ 100,000) = 0.893

 

• An inverter with an MTBF = 100,000 hours, MTTR = 2 hours, would have 99.99% availability. And 99.98% for an MTTR = 20 hour.

 

A = 100,000 ÷ (100,000 + 2) = 0.9999

A = 100,000 ÷ (100,000 + 20) = 0.9998

 

What does this mean for Data Centers?

Reliability emphasizes dependability in the lifecycle of a component, while Availability measures the capability of a system to provide a specified application service level to clients.

Availability offers a more realistic measure than reliability alone, especially for mission critical systems in a data center. Availability, as used in the data center world is the operational availability, or uptime, requirement. It is usually expressed in ‘nines of availability’ (99% to 99.999%); the percentage of hours per year that the system can continue its mission despite component failures.

For example, the BICSI Data Center Standard equates 99.999% availability to 5-50 hours of allowable unscheduled downtime per year.

The more fault tolerant the system, the greater its reliability, and the greater its reliability, the higher its availability rating. In data centers, highly reliable components can add to the availability level of the infrastructure, hence can increase the data center’s operational availability (uptime) rating. For a mission critical facility like a data center, greater reliability and availability are of utmost importance.

Stay tune for my next blog called “Increase Datacenter’s Availability Through the Design of a Reliable Electrical Distribution System”.

Follow Morena on Twitter @MSanidad_WH

This entry was posted in All Insignts, Electrical Engineering and tagged , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

3 Trackbacks

  • […] The focus of our conversation evolved from my car conundrum to a similar discussion we often have with clients. As a mechanical engineer, I know that mechanical equipment can be a sizeable portion of a project’s expense, and budgets are often tight.  Our challenge is to balance that budget with our clients’ best interests.  How can we best leverage our hindsight to be our client’s foresight, and convince them to bite the bullet and buy the expensive pump or air handling unit? Just as I know my old car could give up the ghost at any moment, we know that the cheap pump with no warranty will inevitably fail and leave the facility uncomfortable or inoperable and will cost more in down time than the expensive pump ever could have. Hospitals, data centers, and manufacturing facilities are great examples of buildings that cannot afford any unscheduled outages, so redundancy and reliability are essential. (Check out my colleague Morena Sanidad’s great post about reliability.) […]

  • […] one of my previous blog posts, I talked about the differences between reliability and availability in data centers. The quick summary: reliability emphasizes dependability in the lifecycle of a […]

  • […] for resilient utility infrastructure and systems must address key factors including reliability, availability, maintainability, and flexibility. Reliability and availability relate to fault tolerance of […]

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*
*