(Failure Modes and Effects Analysis)
On Oct 1, 2016, a commuter train crashed in New Jersey killing one and injuring 108 with high speed being a factor. The root cause of the crash is under investigation.
A similar crash happened in Amagasaki, Japan in April 2005 where 106 were killed and 562 injured, and high speed around a curve was a factor. The conventional explanation of the root cause of the Amagasaki crash was corporate pressure on the driver to be on time. Drivers would face harsh penalties for lateness, including harsh and humiliating “training” programs which included weeding and grass cutting duties. In this case, the driver was speeding. The resulting countermeasure in Amagasaki has been to put in an expensive $1-billion-dollar train speed control system on the small line to help mitigate a potential accident.
There have been many other high speed passenger train derailments, such as the Santiago de Compostela derailment in Spain in 2013 (79 dead, 139 injured out of 218 passengers), and the Fiesch derailment in Switzerland in 2010 (1 dead, 42 injured). The root cause explanation of these accidents tends to focus on the drivers driving faster than they should, and countermeasures tend to focus on semi-automated systems to control train speed.
Do we really know the root cause of these accidents, and are the countermeasures both effective and economic?
One of the best root cause analyses I’ve seen on the Amagasaki crash comes from Unuma Takashiro, and his conclusion is unconventional. Unuma-san is a Failure Modes and Effects Analysis (FMEA) consultant from Japan. FMEA is one of the best methods to analyze a design to help prevent failures. FMEA was developed in the aviation and space industries in the 1960’s, adopted by the automotive industry in the 1990’s, and is now prevalent in many industries including health care.
Unuma-san argues that in the case of the Amagasaki crash, the speed control system is expensive and not fail-safe. One economic and effective countermeasure would be to add a $250,000 guard rail, which at the very least would likely prevent a recurrence, and definitely be useful as an additional layer of countermeasure. The advantage of low-cost and effective countermeasures is that they can be widely-deployed.
He argues the real root cause of this failure is that the overall engineering and management approach to mitigating failures was not adequate – both initially to prevent the accident in the first place, and subsequently after the crash by putting in the speed control system but not (also) the guard rail.
Unuma-san has a very interesting and useful website on FMEA practices, and uses the Amagasaki crash as one of many examples. He promotes a FMEA method that uses an absolute evaluation method of countermeasures, as compared to the conventional FMEA which uses a relative evaluation method of countermeasures. The problem with the relative evaluation method is that it can easily miss important failure modes that do not make an arbitrary priority cutoff. Missing important failure modes often leads to unexpected incidents.
He also analyzes the conventional FMEA approach and teachings, and points out many problems seen in industry:
- ineffective because of missing failure modes,
- done too late in the design process, making it more difficult and less likely to implement countermeasures,
- led by team members from other departments that are not responsible for the design, which both lowers the effectiveness of the analysis and can allow the designer to not be held fully accountable for the FMEA results,
- doesn’t promote economical countermeasures, and
- many of the common FMEA teachings contain flaws that promote the above problems.
Unuma-san shows that many FMEAs confuse failure mechanisms (the physical, chemical, thermal, electrical, biological, or other stresses leading to the failure mode) and the actual failure modes (ways a product or process can fail), leading to missing failure modes. If a failure mode is missed, then there may be no countermeasure identified, and subsequently incorporated into the design.
He points out that the relative evaluation FMEA method promotes doing the FMEA on the entire design when enough of the design is done, then once the FMEA is done to a certain level, all of the issues are prioritized, and then acted upon. The problem with this approach is that FMEAs take a lot of time, and by the time the results are done, the recommended changes to the design can be too late to be easily implemented. He promotes instead that the designers do the FMEA as they are doing the design in a very concurrent and “local” manner, while evaluating the countermeasures in an absolute manner against the individual failure mode. This more easily allows for countermeasures to get into the design of the product or process in the early stages.
When non-designers take too much of the FMEA responsibility and scope, the effectiveness of the FMEA is reduced and the results are available late in the design process. The effectiveness is reduced because non-designers are unable to know all the key information in the heads of the designers, and the designers may feel less accountable for the FMEA quality. Results are delayed because instead of countermeasures being considered at the time of the design decision, they are made available after the design decision has been made and it is then more difficult and less likely to have any countermeasure implemented.
Unuma-san’s method is simpler than many FMEAs, by using a four-point scale to the third power (64 ratings), vs. many conventional approaches using of a 10-point scale to the third power (1000 ratings). He promotes determining countermeasures per failure mode, evaluating the likely success of those countermeasures, and whether there is opportunity for optimization and lower costs from reducing overdesign.
Unuma-san goes on to analyze the common teachings of FMEA by referring to many of the most common reference material available in books, training material, websites, etc. and he shows many flaws, inconsistencies, interpretation issues etc. that tend to exacerbate the above issues. Much of the trouble with conventional FMEAs can be traced to poor teachings.
Unuma-san has consulted for a very long and impressive list of Japanese companies on FMEA in the transportation, health care, manufacturing, and consumer goods industries.
I’ve been both a lead designer of multiple complex systems, and I’ve been helping clients improve their product development processes, including FMEA. The teachings of Unuma-san resonate strongly with me. Too often I have seen poorly done FMEAs that miss critical failure modes, late FMEAs whose recommendations are too late to be useful, and FMEA study teams that don’t have enough participation by the design team. The absolute evaluation method FMEA is a substantial improvement over the relative evaluation method, mostly because it evaluates the likely success of countermeasures. I highly recommend his webpage on FMEAs, and it is linked here. It is a little hard to read as the website translation to English isn’t the best, but worthwhile.
I think one of the reasons why FMEA teachings have many issues is that few FMEA teachers have been skilled design engineers, but are instead people that gravitate to process design. The idea behind the FMEA is good and includes teaching early and effective analysis, unfortunately much of the applied practice falls short. A skilled design engineer naturally considers failure modes and tries to design them out, while simultaneously considering many other design tradeoffs, such as performance, function, economics, aesthetics, ergonomics, etc. I think that since the conventional FMEA trainers developed the applied practice of the FMEA, they have continued to build upon the original process of the relative assessment method, and have struggled to develop effective practices that overcome the conventional process shortcomings. In my experience many design engineers have found FMEA to be a good idea but too slow, too time-consuming, and not effective enough to really embrace it.
What I like about Unuma-san’s method it is practical, effective, time-efficient, and evaluates the likely success of countermeasures. It can be very useful to have FMEA experts, trained in this method, who can help designers with training, facilitation, documentation, review, etc.
There are a few other improved FMEA methods available that are trying to address some of the effectiveness and lateness problems with conventional FMEAs, such as “FMMEA”, (Failure Modes, Mechanisms, and Effects Analysis), and there are good teachings in these methods as well. I have found Unuma-san’s method to be among the best and really resonates with me.
FMEA is one of the best methods to help avoid failures. By making the method more effective, products, processes, projects, and infrastructure can have less problems and be more economical. I highly recommend further study on this topic for engineers and managers delivering any system.
Craig Louie, P.Eng., Co-Founder, SysEne Consulting