Every manufacturing engineer has lived through the same ritual. A new
product is launching. Someone remembers that the customer requires an
FMEA. A meeting is scheduled. A cross-functional team gathers around a
conference table with coffee and a laptop. Someone opens the template —
the familiar columns labeled Failure Mode, Effect, Severity, Cause,
Occurrence, Controls, Detection, RPN. For the next four to six hours,
the team fills in rows. They argue about whether the severity should be
a 7 or an 8. They debate occurrence ratings based on gut feeling because
no one has actual failure data. They assign detection scores
optimistically because the inspection step “should catch it.” By the
end, they have a beautiful spreadsheet with 85 rows, a handful of RPNs
above the action threshold, and the satisfying feeling that they have
done their due diligence.
Then the spreadsheet is filed. The product launches. And the failures
that bring production to a halt are the ones nobody thought to include
in the FMEA at all.
This is the FMEA paradox: the most widely used risk analysis tool in
manufacturing has become one of the least effective at actually
preventing failures. Not because the methodology is flawed — it isn’t.
But because the way organizations practice FMEA has drifted so far from
its intent that the tool has become a compliance artifact, a paper
shield, a box to check. And the worst part is that everyone knows it,
but no one wants to say it, because admitting that your FMEA process is
theater means admitting that your risk management is theater, and that
is a conversation nobody wants to have with their customer or their
auditor.
What FMEA Was Supposed to Be
Failure Mode and Effects Analysis was originally developed by the
aerospace industry in the 1940s and later adopted by NASA for the Apollo
program. The concept was straightforward and powerful: systematically
identify every way a system could fail, assess the consequences of each
failure, and prioritize resources toward preventing the most critical
ones. It was analytical, disciplined, and genuinely preventive.
When the automotive industry adopted FMEA through the AIAG standards
and later the harmonized AIAG/VDA handbook, it became a required
document for every new product and process. PPAP submissions require
FMEAs. IATF 16949 requires them. Customers audit them. Registrars review
them. FMEA went from an engineering analysis tool to a contractual
obligation, and that shift changed everything.
What FMEA Actually Became
The first sign of trouble is timing. In theory, FMEA should be one of
the earliest activities in a new product or process development cycle.
You analyze risks before you commit to a design, before you build
tooling, before you lock in your process flow. In practice, FMEA is
almost always done late — after the design is frozen, after the process
is laid out, after tooling is ordered. At that point, the FMEA isn’t
analyzing risks to prevent them; it’s documenting risks you’ve already
accepted. You can’t change the design because the tooling is already
built. You can’t modify the process because the capital is already
spent. So the FMEA becomes a post-hoc justification rather than a
preventive analysis. You are describing the risks you’ve already decided
to live with.
The second sign is the RPN game. The Risk Priority Number — Severity
× Occurrence × Detection — was supposed to be a prioritization tool. In
practice, it has become a negotiation. Teams spend disproportionate time
arguing over individual ratings, not because they disagree about the
underlying risk, but because they need the RPN to fall below the
customer’s action threshold. If the threshold is 100, you can be certain
that most RPNs will come in at 95 to 99. The ratings are massaged, the
detection controls are described more optimistically than they deserve,
and the result is a set of numbers that tell you more about the team’s
desire to avoid additional work than about the actual risk profile of
the product.
The third sign is the scope problem. A thorough FMEA for a moderately
complex product or process should have hundreds of potential failure
modes. In practice, most FMEAs contain 50 to 100 rows because the team
runs out of time, patience, or meeting room availability. The failures
that get documented are the obvious ones — the ones everyone already
knows about. The novel, unexpected, interaction-based failures — the
ones that actually cause new product launches to fail — are the ones
that never make it into the spreadsheet because nobody thought to
imagine them. This is the cruel irony: FMEA is supposed to help you
anticipate the unexpected, but it is inherently limited by what the team
can imagine during a finite meeting.
The fourth sign is the link break. FMEA outputs are supposed to drive
actions — design changes, additional controls, improved detection
methods. In practice, the recommended actions column is filled with
vague entries like “monitor during production” or “provide training” or
“update work instruction.” These are not actions; they are aspirations.
They are the minimum the team can write down to satisfy the form without
committing to actual engineering changes. And even when genuine actions
are identified, there is often no follow-through mechanism. The FMEA is
completed, the spreadsheet is filed in the PPAP package, and the action
items become someone’s responsibility in theory but nobody’s priority in
practice.
The Severity Trap
One of the most dysfunctional patterns in FMEA practice is the
treatment of severity ratings. Severity is supposed to reflect the
impact of the failure mode on the customer or the end user. A failure
that causes safety risk should receive the highest severity rating
regardless of how unlikely it is. But in practice, teams routinely
downweight severity for unlikely events because they don’t want to deal
with the action items that a high-severity, high-RPN failure mode
generates. “It’s never happened before” becomes the justification for a
lower severity, which fundamentally misunderstands the rating. Severity
describes the consequence if the failure occurs, not the probability
that it will occur. That’s what the occurrence rating is for. But when
teams conflate these two dimensions, the entire risk assessment
collapses into a single gut-feel judgment disguised as a structured
analysis.
This is particularly dangerous in safety-critical applications. When
a team assigns a severity of 5 to a failure mode that could cause injury
because “we’ve never had that happen,” they are not analyzing risk —
they are gambling with someone else’s safety and calling it engineering
judgment.
The Detection Delusion
The detection rating is arguably the most manipulated column in the
entire FMEA. Detection is supposed to assess the likelihood that the
current controls will detect the failure mode before it reaches the
customer. A rating of 1 means the failure is almost certain to be
caught. A rating of 10 means there is essentially no detection
capability.
In practice, teams assign optimistic detection ratings because the
alternative — admitting that you have no reliable way to detect a
failure — triggers uncomfortable conversations about investing in better
inspection, testing, or process monitoring. So “operator visual
inspection” gets a detection rating of 3 or 4, despite the
well-documented reality that visual inspection catches at best 85% of
defects under ideal conditions and often far less under production
conditions. “Final inspection” gets credited as a detection control even
when the final inspection is a sampling plan that explicitly accepts a
certain percentage of defects. “SPC” is listed as a detection method for
failure modes that would not cause out-of-control signals on the charts
being used.
Every overoptimistic detection rating artificially deflates the RPN,
which means real risks don’t get the attention they deserve. The
detection column, intended to drive investment in better controls,
instead becomes a tool for avoiding that investment.
The Update Vacuum
Perhaps the most damaging aspect of FMEA practice is what happens
after the initial document is completed. FMEA is supposed to be a living
document, updated as new failure modes are discovered, as process
changes are implemented, as field data reveals risks that weren’t
anticipated. In reality, most FMEAs are written once, approved once,
filed once, and never meaningfully updated again.
When a new failure occurs in production that wasn’t in the FMEA, the
corrective action report may reference the FMEA update, but the update
is often superficial — a single row added retroactively to document what
happened rather than to prevent what might happen next. The FMEA becomes
a historical record of failures already experienced rather than a
forward-looking analysis of failures yet to come. This is exactly
backwards. An FMEA that catalogs past failures is an expensive
lessons-learned database. An FMEA that anticipates future failures is a
prevention tool. Most organizations have the former and think they have
the latter.
The AIAG/VDA handbook introduced changes intended to address some of
these issues — the Action Priority (AP) system replaced raw RPN
thresholds, and the emphasis on FMEA as a living document was
reinforced. These are improvements in methodology, but they don’t
address the fundamental cultural problem: organizations treat FMEA as a
documentation requirement rather than an analytical practice, and no
amount of handbook revision will fix a cultural issue with a technical
update.
The Competence Problem
FMEA quality is directly proportional to the experience and
analytical capability of the team performing it. A team of experienced
engineers who have seen similar products fail, who understand the
physics of the process, who can imagine failure modes that haven’t
occurred yet but are physically possible — that team will produce a
valuable FMEA. A team of engineers who are filling out their first FMEA
template, guided by a facilitator who is focused on completing the
document rather than understanding the risks — that team will produce a
compliance artifact.
Most organizations do not invest in training engineers to think
analytically about failure. They invest in training engineers to fill
out the FMEA form correctly. The emphasis is on the format, the rating
scales, the linkage to other PPAP documents. The emphasis is rarely on
how to think creatively about what could go wrong, how to challenge
assumptions, how to look for interactions between failure modes that
amplify consequences. The form is taught; the thinking is assumed. And
the thinking is where the value is.
What Better Looks Like
Organizations that get real value from FMEA share several
characteristics that have nothing to do with the form and everything to
do with the culture.
First, they do FMEA early. Not after the design is frozen, but during
concept development, when changes are still cheap. They accept that the
FMEA will be incomplete at this stage and that it will be updated as the
design matures. They treat the initial FMEA as a rough draft of the risk
picture, not a finished document to be filed.
Second, they assign ratings honestly. They use data where it exists,
and where it doesn’t, they are conservative rather than optimistic. They
accept that high RPNs or high action priorities are not failures of the
design but invitations to improve it. They do not view a high RPN as a
problem to be argued away but as a risk to be engineered away.
Third, they invest in detection. They do not list “operator
inspection” as a detection control and assign it a favorable rating.
They invest in mistake-proofing, automated inspection, and process
monitoring that genuinely reduces the probability that defects reach the
customer. They understand that detection is not about checking boxes but
about building systems that make it physically difficult for defects to
escape.
Fourth, they update relentlessly. Every customer complaint, every
internal nonconformance, every near-miss triggers a review of the
relevant FMEA. They treat the FMEA as the living memory of the product’s
risk profile, updated in real time as new information emerges. The FMEA
is not a historical document; it is a current picture of what the
organization knows about what could go wrong.
Fifth, they conduct FMEA reviews periodically even when nothing has
gone wrong. They bring fresh eyes — engineers from other product lines,
quality engineers from other plants, suppliers with knowledge of similar
applications — and ask them to challenge the existing analysis. This is
uncomfortable, because it means admitting that the previous analysis
might have missed something. But it is far more uncomfortable to
discover what you missed through a customer complaint or a field
failure.
The Honest Question
If your organization’s FMEAs have never identified a failure mode
that surprised you, if every failure mode in your FMEA is one that
everyone already knew about before the meeting started, if the
recommended actions column is filled with “monitor” and “train” rather
than design changes and process improvements — then your FMEA process is
not analyzing risk. It is documenting what you already know in a format
that satisfies your customer’s documentation requirements.
That is not risk management. That is risk theater. And the difference
between the two is the difference between catching a failure before it
reaches your customer and explaining to your customer why the failure
that “nobody could have predicted” was documented in a spreadsheet that
nobody read.
The choice is not between doing FMEA and not doing FMEA. The choice
is between doing FMEA as a thinking exercise that genuinely changes how
you design and manufacture products, or doing FMEA as a paperwork
exercise that changes nothing except your compliance score. One prevents
failures. The other prevents audit findings. They are not the same
thing, and the organizations that confuse them are the ones that
eventually learn the difference the hard way.
Peter Stasko is a Quality Architect with over 25
years of experience in manufacturing quality systems, process
optimization, and continuous improvement. He has implemented and audited
quality management systems across automotive, aerospace, and industrial
manufacturing sectors, and he writes about the patterns he has observed
in organizations that struggle to turn quality tools into quality
outcomes.