February 14, 2007
Redundancies: A Sense of False Security
President, SUCCESS by DESIGN
In CBM programs the objective is to manage a balance of risk to improve and optimize maintenance practices. With many of the ‘new’ concepts in CBM/RCM, a common mistake is made. Many people are under the impression that just because a system has a redundancy, it is 100% available and no other maintenance is required. This is not true!
Instead, a parallel/redundant system must be viewed in such a way that the analyst or team can evaluate the risk of the ‘system.’ For instance, I was once involved in a parallel cooling pump system designed to cool the bearing in a large generator. The primary machine failed and there were defects in the redundant machine. This tripped and the whole generator came offline having an impact in the $Millions.
Another case is the argument about alternating between redundant machines in the system. The prevalent thought is that this will somehow reduce the life of the system or components. In fact, it does not matter as having equipment idle can have the same, or greater, impact on component reliability.
So, why is the focus on redundant systems as the system having a ‘get out of maintenance’ free card? Simple: Many do not understand the Reliability Function.
The Reliability Function considers such things as the Mean Time Between Failure (MTBF) and the probability of failure. It can be represented as R(t) = 1 – F(t) where F(t) is the probability that the system will fail by time t. The MTBF is equal to the number of failures divided by a specific time period, in this case we will say ‘hours.’ R(t) is expressed as a natural log function, the time being studied and the MTBF such that R(t) is equal to the natural log of (-t/MTBF) where t is the time being studied.
So, if I have a critical pump system that has an MTBF of 40,000 hours, and I want to know the chance of survival at 20,000 hours, it can be calculated as:
R(t) = e^(-20,000/40,000) * 100% = 60.6%
Now, in a series system, R(t) = (R(t)a)(R(t)b)(R(t)c)… where a, b, c represent the individual components in series. So, if we have three systems that have an MTBF of a = 10,000 hours, b = 40,000 hours and c = 25,000 hours, and we wanted to review the survival at 7,000 hours, the R(t) would be:
R(t) series = (0.497)(0.839)(0.756) = 0.315 * 100% = 31.5%
For parallel reliability where there are two identical systems, R(t) = R(t)a + R(t)b – (R(t)a)(R(t)b). Therefore if we look at the critical pump system with an MTBF of 40,000 hours above and place a parallel pump system, the survival at 20,000 hours will now be:
R(t) = (0.606) + (0.606) – (0.606)(0.606) = 0.845 * 100% = 84.5%
Now, if the organization is willing to accept the improvement from 61% to 85%, then CBM would not be required. However, if one requires a much lower risk, such as expecting 100% availability, then additional work is required.
For more information on SUCCESS by DESIGN training and coaching programs, email us at
{{PERIOD}}
