Failure Mode and Effects Analysis in Various Project Stages
Developed by Magnus Held
Defects, downtime, injuries and delays can be extremely costly and often directly interfere with a company’s objective of delivering quality and reliability. So how can these events be prevented from occurring?
Extensive testing in the later phases of product development project could for instance be a quality testing measure. However discovering the faults at this stage can be very costly and mean even further delays. Instead, it is ideal to take preventive measures, which can be achieved through applying risk management tools such as: “Failure Modes and Effects Analysis (FMEA)”. FMEA is a systematic step-by-step approach for identifying, analyzing and preventing product and process failures. The tool is often applied in the early stages of a project, but can be used throughout in different variations.
This article aims to inform the reader on how to facilitate FMEA, how it can add value in different phases of a product development process along with its limitations.
In this article, a “product” is often referred to as a “system” consisting of sub-systems and components.
Before discussing the use of FMEA in different project phases, the general concept of the tool and its application will be outlined.
FMEA is a qualitative tool that investigates how a product or process might fail (failure mode) in delivering its intended function and the consequences of this. The tool applies with the standards of Qualitative Risk Analysis defined by the PMBOK guide, which shortly can be summed up as a prioritization of risks for further investigation by assessing their probability of occurrence and severity. In practice the tools objective and compliance with the standards can be formulated as:
- Identification and understanding of failure modes and their causes, and the effects of failure on the system or end users, for a given product or process.
- Risk assessment associated with the identified failure modes, effects, and causes, and prioritizing of issues for corrective actions.
- Implementation of corrective actions for the most critical failure modes along with evaluation after implementation.
FMEA can be categorized as a proactive risk reduction tool that after initiation becomes a dynamic factor of improvement. This I due to the iterative nature of risks identification where new risks may evolve or existing risks becomes revealed as the project advances through its lifecycle. . The motive for implementing risk identification followed by risk reduction initiatives derives from the objective of changing characteristics of the system, without significantly increasing cost. This ability to do so is highest at the projects start and decreases as the project progresses. A general rule of thumb for the phenomenom is the Factor of 10 rule, which states that changing characteristics of a product in a development project multiplies by a factor 10 for each time the project progresses in to a new phase. The factor of 10 rule is illustrated in the figure below.
FMEA is usually created within a spreadsheet, which enables the user to get an overview over complex systems with multiple components being inspected.
The spreadsheet example consist of 11 descriptive categories that defines the FMEA investigation. These are outlined below:
1. Item The system, subsystem, component or process step chosen for investigation. For example a burglary alarm consist of several components such as electronic hardware, battery, wiring, housing etc. All of these can be investigated as individual items if the FMEA team find them relevant for inspection in a risk context. The burglary alarm can also be considered as one system "item", however it is usually easier to overview a system by dividing it into subsystems and components.
2. Function The intended function of the system, subsystem, component or process.
3. Potential failure modes and identifying these Failure modes are the possible ways the intended function of the system, subsystem, component or process fails. For example a burglary alarm can fail because of a dead or missing battery, faulty wiring, defective sensor, external interference etc.
Failure modes are the backbone of the FMEA tool and identifying these lay the foundation for a successful and broad inspection of how a process or product might fail. When identifying these the necessity for a cross-functional team is heavily underlined since stakeholders have different interpretations and views of possible failure modes. Therefore selecting a team with a broad field of experience relevant to the subject is crucial for the success of the FMEA. As the PMBOK standards dictates that in order to establish expert judgement the team needs to consist of individuals with specialized training or knowledge concerning the system being investigated. Individuals such as Senior management, Project stakeholders, Project managers (with experience from similar projects), Subject matter experts, Industry groups and consultants, and Professional and technical associations. For a product development project individuals such as designers, production workers, end users, suppliers etc. are also highly relevant experts. Once the team are assembled, the brainstorming of potential failure modes can be initiated. Different variations of brainstorming can be applied in order to explore as many failure modes as possible. When developing new products, data from similar products and production methods can be used as inspiration for possible failure modes.
As a baseline for the identification of failure modes Murphy’s law is introduced as a guiding statement: “Whatever can go wrong, will eventually go wrong”. No product is ever completely sound and it will eventually fail one way or the other.
4. Potential effect(s) of failure Consequence of the failure on the system, subsystem, component or process level. The effects are often divided in local- and system effects. if the effect is local is does not affect the function of the system as a whole, and vice versa for a system effect.
5. Severity (S) Ranking of the most serious effect for a given failure mode. The severity is often scaled from 1 to 10, where 10 is the most severe. The ranking can be a product of an assessment based on experience or supported by data.
6. Potential Cause(s) of failure Specific reason(s) for the failure modes to occur (root causes).
7. Occurrence (O) Describes the likelihood of a failure mode occurring and is often scaled from 1 to 10, where 10 is the most likely. It is important to use data (if available) to validate the occurrence ranking. This could be from current design controls.
8. Current design controls (prevention/detection) Methods or actions already in place to prevent or detect failure modes. These are linked to the ranking of the occurrence (prevention) and the ranking of detection.
9. Detection (D) Ranking scale of the possibility of detecting the cause of failure modes. Is often scaled from 1 to 10, where 10 is the least likely. It is important to use data (if available) to validate the detection ranking. This could be from current design controls.
10. RPN (Risk Priority Number) Ranking of failure modes according to severity, occurrence and detection. The RPN value is found by multiplying these three values. RPN = S x O x D. The Ranking provides a quantification of the failure modes which enables the team to get an overview of the most crucial failure modes.
11. Recommended actions Task recommended by the FMEA team with the objective of reducing or eliminating the risk associated with the failure modes. These tasks need to have specified owners and deadlines.
In order to conduct an FMEA effectively it is highly recommended to follow a systematic step-by-step approach. D. H Stamatis recommends an eight-step method consisting of the following steps:
1. Team selection and brainstorm
– As stated in the section "3. Potential failure modes and identifying these" a cross-functional team with diverse knowledge about the process or product is required for a broad exploration of failure modes. The team defines and prioritizes the opportunities for improvement as a scope for the FMEA process. If specific issues have been addressed by a supplier, costumer etc. the direction is given. If the project Is concerning new development or continual improvement tools such as brainstorms, storybook methods etc. can be used to determine the direction/area of focus.
2. Overview and team alignment.
In order to align the team effort and establish an overview of the systems, subsystems, components and processes being investigated, tools such as functional block diagrams or process flowcharts are valuable assets. These tools provide both an overview and a working model for the systems, subsystems, components and processes that ensures everyone is on the same page and understands the problems associated with these.
Executing an FMEA on big systems or product consisting of many components can be time-consuming and very costly. Therefore a prioritizing is often needed to establish where the main issues are located and where the FMEA can add most value. Preferably, the prioritizing is supported by data that verifies the issues. With smaller systems or if the issues is addressed by a third-party the prioritizing is given and the step can be skipped.
4. Data collection
The team collects data of the failures and categorizes them in the FMEA spreadsheet.
The data collected is now utilized in order to fill out the columns of FMEA spreadsheet described in the previous section (The tool). Tools such as cause-effect-analysis, brainstorming, mathematical modeling etc. can be used to support the determination of severity, occurrence and detectability.
The severity, occurrence and detectability values are multiplied in order to calculate the RPN. Afterwards the failure modes are ranked according to this number.
7. Recommended actions and evaluation
The results are used to prioritize the recommend actions and determine where it is most crucial to act. Once the recommended actions have been completed the team reevaluates and rescore the severity, occurrence and detectability for the top ranking failure modes. This is done to determine the effectiveness of the recommended actions.
FMEA has an underlying philosophy of facilitating continuous improvements with the long-term goal of eliminating every failure mode. In order to achieve that, repetition is key. In practice, it is almost impossible to completely eliminate every failure mode. Therefore, a critical RPN value is often chosen to determine when a failure mode is no longer worth investigating. This is of course very different from project to project.
Variations of FMEA and their application in different project stages
In this chapter the FMEA variations are discussed and linked to their respective roles in the different phases of a product development. FMEA can be tailored to fit many different applications and industries, but In general, the tool is normally divided in to three categories concerning product development.
- Concept FMEA
- Design FMEA
- Process FMEA
In practice the FMEA tool is very similar in all variations, however it differs on objective and scope. Below, a road map of a product development process and FMEA can be seen. It illustrates the product development process and the phases in which FMEA is feasible to apply.
The Concept FMEA, also referred System FMEA is the highest level analysis of an entire system. The focus is on system related deficiencies and their interconnection. The concept FMEA is facilitated when concept alternatives are being considered. Often defined as feasibility studies that can prove very valuable in elimination of poor design concepts with inherent risks. When considering many concept options, the efficiency of the process can be optimized by focusing on the concept’s primary functions and the corresponding failure modes, effects and causes that raises the greatest concern.
- Analyze concept alternatives in the conceptual phase ranked by RPN, which help determine the most feasible concept option.
- Assists in the process of identifying and eliminating risks.
The Design FMEA focuses on product design and the identification of potential risks caused by design deficiencies. The identification is normally performed at sub-system or component level also defined as lower level failures on system operation. For instance, a bicycle is made of multiple components and in order to facilitate a thorough risk assessment it would be necessary to evaluate the different components individually; chain, tires, frame etc. and afterwards relate their interconnection in the system and the system effect of their specific failure modes. Even though the Design FMEA is usually performed in the design phase, it can also be used to evaluate existing products and systems. The findings are then prioritized with a systematic approach and used to improve future designs.
When facilitating a design FMEA it can be considered to do either a bottom-up- or top-down- approach. Both approaches are illustrated below.
The top down approach assumes a system failure and identifies how that failure could occur by working downwards analyzing individual components related to the failure. This approach is often used when the complexity of the system is high and specific components and failure modes cannot be related without further investigation, or when the scope of the investigation focusses only on a set of specific risks. However, with a top-down approach, FMEA might only discover major failure modes in a system.
The bottom-up approach involves determining failure modes at component level and working upwards analyzing their effect on the system. This approach is more thorough and ensures all components are analyzed and considered accordingly. The bottom-up approach works well when every component has to be reviewed, however it can be difficult to perform on complex systems or systems that are not well defined.
- Establishes a priority for design improvement actions based on the failure modes and their prioritized ranking by the RPN.
- Documents the rationale for changes.
- Consolidates the system by corrective actions.
The Process FMEA focuses on manufacturing and assembly processes at the system, sub-system or component level and normally with the assumptions that the design is sound. The objective of the variant is to ensure that the product meets design requirements safely, with minimal downtime, scrap and rework. The Process FMEA can encompass manufacturing, assembly, shipping, internal transport of materials, tool maintenance etc.
- Identifies process deficiencies prioritized by their RPN along with corresponding corrective actions.
- Identifies critical/significant characteristics that should be encompassed in production control plans.
- Documents the rationale for changes.
- Consolidates the system by corrective actions.
In order to determine which failure modes are the most crucial these are quantified with a Risk Priority Number (RPN). The RPN is calculated from severity (S), occurrence (o) and detectability (D). This quantification is easily applicable and works well for prioritizing datasets with many components and failure modes. It provides an indication of relative risk, however as seen in the table below with the three calculated RPN values for fictive failure modes, this quantification sort of diminishes the severity in cases with high severity and low occurrence and detectability. A high severity failure mode can in some case cause accidents resulting in fatality, which would be catastrophic for most companies even though it is not likely to occur. Therefore, it is recommended to inspect failure modes with high severity ranking individually in order to prevent this.
The RPN ranking is primarily done qualitatively and is therefore a product of the teams subjective perception of the failure modes and the corresponding severity, occurrence and detectability. It can very time consuming and expensive to collect objective data for each potential failure, hence the qualitatively approach. The experience of the team and the proper training provided for investigation is therefore crucial for the succes of the FMEA. However the team will never be able to anticipate every failure mode and their subjective assessment might be incorrect as well.
Complexity of the system
FMEA is very effective when applied as a bottom-up tool used to analyse elements that causes the entire system to fail. However when used as a top-down tool the system can have multiple functions and a number of components to be analyzed. When analyzing a big system the FMEA team converges between taking on to large of a scope or one that is too small. If the scope is to broad the investigation will be very time consuming due to the quantity of detailed system information and important aspects might drown in an ocean of to much data. On the other hand important aspects might be left out if the scope is to narrow.
These failure modes can be very difficult to anticipate since human behavior is subjective. Therefore, these are often first discovered during interaction with the system. The difficulty is increased when the effects of the environment significantly interfere with the interaction (rain, fog etc.).
Comparison with PMBOK standards
Compared to the the Probability and Impact matrix, specified in the PMBOK standards under qualitative risk analysis, the FMEA can be seen as an extension that investigates each is risk at a deeper level illustrating more aspects along with recommended actions for risk reduction. However it should only be applied when a more detailed investigation is necessary, otherwise it proves inefficient, time consuming and costly. The two tools can also complement each other where FMEA is used as a tool of investigation and the Probability and Impact matrix illustrates the findings as a product of severity and probability. This approach eliminates the problem of diminishing failure modes with high severity due to a multiplication of severity, detectability and occurrence.
FMEA, when facilitated by a capable team in the right time frame and addressing relevant systems, subsystems or components, can prevent costly product recalls, safety issues etc. by addressing problems before the product enters the market. As shown by the 10 factor rule (figure 1) it is far less costly to prevent problems through tools such as FMEA than to pay for expensive field problems. Furthermore FMEA reduces development time by addressing the problems early and hereby eliminating costly and time-consuming test and fix phases. Customers satisfaction is also looked after by elimination of failures before the users discovers them, which is key in maintaining company reputation.
All in all FMEA has the ability to reduce costs, enable faster development times, and meet high customer expectations.
- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 D. H Stamatis, 2003, 2. Edition, "Failure Mode and Effect Analysis – FMEA from theory to execution", ASQ Quality Press. - Step-by-step approach for implementing FMEA and overview of phases the method is applicable to. Also includes ISO references and Six Sigma practices.
- ↑ 2.0 2.1 2.2 2.3 2.4 Project Management Institute, 2013, "Project Management Body of Knowledge", 5th edition - identifies that subset of the project management body of knowledge that is generally recognized as good practice. “Good practice” means there is general agreement that the application of the knowledge, skills, tools, and techniques can enhance the chances of success over many projects.
- ↑ 3.0 3.1 3.2 3.3 C.S. Carlson, 2012, "Effective FMEAs: Achieving Safe, Reliable, and Economical Products and Processes Using Failure Mode and Effects Analysis", John Wiley & Sons, Inc. - procedures for doing FMEAs and how to successfully apply them in various phases
- ↑ 4.0 4.1 S. K. Sethiya, (chief mechanical engineer – West Central Railway at Jabalpur), 2004 , "Failure Mode and Effects Analysis (FMEA)" - General concept of FMEA with outlining of top-down and bottom-up approaches..
- ↑ Mike Silverman, 2013, 2. Edition, "FMEA on FMEA", IEEE. - FMEA performed on FMEA, exploring how FMEAs go wrong and how to avoid these pitfalls.