FMEA: When Your Risk Assessment Becomes Elaborate Theater Instead of Actual Prevention — and the Failures You Documented Became the Failures You Still Experienced

Failure Mode and Effects Analysis is one of the most powerful
prevention tools in the quality professional’s arsenal. It is also one
of the most routinely misused. Across manufacturing plants worldwide,
cross-functional teams gather in conference rooms, fill out spreadsheets
with severity, occurrence, and detection ratings, calculate risk
priority numbers, and file the completed FMEA away in a digital cabinet
where it is never seen again until an auditor asks for it.

The irony is almost poetic. A tool designed to anticipate and prevent
failures has become, in many organizations, a post-failure paperwork
exercise that prevents nothing. The FMEA that should have caught the
design flaw before production instead gets updated after the customer
complaint — its rows retroactively populated with the failure that
already happened, its risk priority numbers adjusted to reflect the
reality that everyone already lived through. The document is accurate.
The prevention is nonexistent.

The Anatomy of a Living
Prevention Tool

A proper FMEA is not a form. It is a living engineering analysis that
drives design and process decisions. It begins before the design is
frozen, before the process is finalized, before the first part is
produced. Its purpose is singular: to identify what could go wrong,
assess how bad it would be, estimate how likely it is, evaluate whether
you would catch it, and then — critically — do something about it.

The structure is straightforward. For each component, process step,
or function, you list the potential failure modes. For each failure
mode, you describe the effect. You rate severity on a scale of one to
ten. You rate occurrence on a scale of one to ten. You rate detection —
your ability to catch the failure before it reaches the customer — on a
scale of one to ten. You multiply them together to get a risk priority
number, or in modern versions, you use an action priority classification
of high, medium, or low.

Then comes the part most organizations skip: you define corrective
actions, assign owners, set deadlines, and actually implement the
changes. You re-rate the risk after the actions are complete. The RPN
goes down, or it does not, and you iterate.

Where It Goes Wrong

The first and most common failure mode of FMEA itself is treating it
as a compliance exercise. The automotive standard requires an FMEA, so
an FMEA is produced. The medical device regulation requires an FMEA, so
an FMEA is produced. The customer audit checklist asks for an FMEA, so
an FMEA is produced. In each case, the goal becomes possessing the
document, not executing the analysis. The difference is everything.

A compliance-driven FMEA has recognizable characteristics. It is
completed after the design is frozen, sometimes after production has
started. It lists failure modes that are obvious and non-controversial.
It assigns severity, occurrence, and detection ratings that cluster in
the middle of the scale — fours, fives, and sixes — because nobody wants
to be the engineer who rated their own design a nine. It identifies
recommended actions that are vague: “improve process control” or
“enhance operator training.” It has no owners, no deadlines, and no
follow-up. Its risk priority numbers are calculated with precision and
ignored with enthusiasm.

The second failure mode is the expertise gap. An FMEA is only as good
as the team that creates it. When the session is populated solely by
quality engineers who were not involved in the design, the analysis
captures what quality engineers can imagine going wrong — which is a
subset of what can actually go wrong. The design engineer who knows that
the tolerance stack-up is marginal in one specific condition is the
person who needs to be in the room. The operator who has seen the part
crack during assembly in a way the drawings do not predict is the person
who needs to speak. The supplier who knows their material batch
consistency has a tail risk is the person who needs to contribute.

When the right people are not in the room, the FMEA becomes a
reflection of organizational blind spots rather than a map of actual
risks. The failure modes that are documented are the ones that are safe
to discuss. The ones that would require uncomfortable conversations
about design choices, supplier selections, or budget constraints are the
ones that remain unwritten — and those are precisely the failures most
likely to occur.

The Rating Trap

The one-to-ten scales for severity, occurrence, and detection seem
objective. They are not. They are deeply subjective assessments colored
by organizational politics, individual experience, and the natural human
tendency toward optimism.

Severity ratings are inflated or deflated depending on who is in the
room. A failure mode that would cause a safety issue might be rated a
six instead of an eight because acknowledging it as an eight would
trigger expensive design changes. A cosmetic defect might be rated a
seven because the customer has been vocal about appearance, even though
the actual functional impact is negligible.

Occurrence ratings are perhaps the most manipulable number in all of
quality engineering. The scale asks you to estimate the probability of a
failure mode occurring, but this estimation is based on what data? If
the process is new, there is no data. If the process is existing, the
historical data may not account for the specific design change being
analyzed. Engineers routinely rate occurrence as a two or three —
meaning rare — for failure modes they have never actually tested, based
on nothing more than professional confidence. The difference between an
occurrence rating of two and four is often the difference between an RPN
that triggers action and one that does not, which means this untested
assumption is making real engineering decisions.

Detection ratings suffer from a different problem: they measure the
effectiveness of controls that may not exist yet. Rating detection
assumes you have defined your inspection method, your sampling
frequency, your test protocol. In practice, detection is often rated
based on the assumption that “we will figure out how to inspect this
later.” Later, of course, does not always arrive. The detection rating
of three becomes the justification for not adding the extra test step,
and the failure that should have been caught by the test that was never
implemented becomes the warranty claim that nobody predicted.

The Action Gap

The most critical failure in FMEA practice is the gap between
identifying risk and acting on it. Many organizations are genuinely good
at the identification phase. They can list failure modes, rate them,
calculate RPNs, and produce beautiful spreadsheets that would impress
any auditor. Then they stop.

The recommended actions column is filled with intentions. “Add
process control.” “Improve fixture design.” “Increase sampling
frequency.” These are not actions. They are aspirations. An action has
an owner, a deadline, a specific deliverable, and a verification method.
“John will design and validate a new fixture with increased clamping
force by March 15, and the validation report will demonstrate a 50%
reduction in part misalignment” is an action. “Improve fixture design”
is a wish.

The organizations that get value from FMEA are the ones that treat
the recommended actions with the same rigor they would treat a customer
corrective action request. Actions are tracked in project management
systems. Completion is verified, not self-reported. Effectiveness is
measured by re-rating the risk after the action is implemented and
confirming that the risk actually decreased. If the occurrence rating
was a six and the recommended action was to add a poka-yoke device, the
post-action occurrence rating should reflect the actual performance data
showing that the device reduced the failure rate. If it did not, the
action was ineffective and a new approach is needed.

Process
FMEA vs. Design FMEA: Two Different Languages

A common organizational mistake is treating Design FMEA and Process
FMEA as interchangeable exercises with different column headers. They
are fundamentally different analyses asking different questions.

The Design FMEA asks: given this design, what could fail, and how can
the design be made more robust? Its outputs should drive design changes
— material substitutions, geometry modifications, tolerance allocations,
redundant features. When a Design FMEA identifies a high-severity,
high-occurrence failure mode, the correct response is not to add a
process control. The correct response is to change the design so the
failure mode cannot occur.

The Process FMEA asks: given this design (which is now fixed), how do
we manufacture it without introducing defects? Its outputs should drive
process changes — fixture improvements, control plan modifications,
inspection method selections, operator training programs. When a Process
FMEA identifies a high-risk failure mode, the correct response is to
modify the process to prevent or detect it.

Conflating the two leads to the common anti-pattern of using Process
FMEA to compensate for design deficiencies that should have been
addressed in the Design FMEA. The part has a wall thickness that is too
thin, the Design FMEA flagged it but the design was not changed, and now
the Process FMEA is loaded with expensive controls to compensate. The
organization spends its way around a design problem it refused to solve
at the source.

The Revision Problem

An FMEA that is completed once and never revised is a snapshot of
organizational knowledge at a single point in time. It is immediately
stale. Every customer complaint, every internal defect, every near-miss,
every process change, every design revision should trigger a review of
the relevant FMEA. In practice, most FMEAs are updated only when forced
— typically during a quality audit or a customer-required periodic
review.

The revision problem compounds over time. An FMEA created three years
ago for a product that has undergone seven engineering changes may bear
little resemblance to the product currently being manufactured. The
failure modes that were identified may no longer be relevant. New
failure modes introduced by the design changes may be absent. The
process controls referenced in the FMEA may have been modified,
eliminated, or replaced. The document that was once a living analysis
has become a historical artifact that actively misleads anyone who
relies on it.

Effective organizations build FMEA revision into their change
management processes. Every engineering change request triggers a review
of the affected FMEA rows. Every significant quality event triggers a
retrospective review: was this failure mode identified? If yes, why was
the prevention ineffective? If no, why was it missed? The lessons are
fed back into the analysis, and the FMEA improves with each cycle.

FMEA in the Age of Data

Modern manufacturing generates volumes of process data that the
original FMEA methodologists could not have imagined. Statistical
process control, machine learning anomaly detection, real-time sensor
networks — all of these provide empirical data that can validate or
challenge the assumptions baked into an FMEA.

The occurrence rating you estimated based on engineering judgment can
now be compared against actual process data. If you rated a failure mode
occurrence as a two and your defect data shows it occurs at a rate
consistent with a five, your FMEA is wrong and needs to be corrected.
The detection rating you assigned based on your planned inspection
protocol can now be evaluated against actual inspection effectiveness
data. If your detection system is catching 95% of defects, your
detection rating should reflect that. If it is catching 60%, your
detection rating is aspirational.

Some organizations are beginning to integrate FMEA with digital twin
simulations, using Monte Carlo methods to generate probability
distributions for occurrence ratings instead of relying on single-point
estimates. This is an improvement, but it is still only as good as the
failure modes that were identified. A sophisticated probability model
applied to an incomplete set of failure modes produces precise answers
to the wrong questions.

The Cultural Dimension

Ultimately, the effectiveness of FMEA is a cultural question. In
organizations where people are punished for raising concerns, the FMEA
will document only safe, non-threatening failure modes. In organizations
where design engineers are incentivized to minimize risk ratings to
avoid costly changes, the severity and occurrence numbers will be
systematically underestimated. In organizations where quality is viewed
as a department rather than a shared responsibility, the FMEA will be
quality’s document, not engineering’s tool.

The organizations that derive genuine value from FMEA share common
cultural traits. Cross-functional participation is expected, not
invited. Raising uncomfortable failure modes is rewarded, not resented.
Risk ratings are challenged and debated, not rubber-stamped. Recommended
actions are tracked with the same discipline as production schedules.
The FMEA is revised continuously, not updated under duress.

These organizations also share a pragmatic understanding: FMEA does
not prevent all failures. It reduces the probability and impact of
failures that could have been anticipated. It is a structured method for
applying collective engineering knowledge to the problem of what could
go wrong. It is not a crystal ball. It will not predict the failure mode
that nobody imagined. But it will predict most of the failure modes that
experienced engineers could have imagined if they had been given the
time, the structure, and the organizational permission to think
carefully about what could go wrong.

And that — the act of thinking carefully, systematically, and
honestly about failure before it happens — is worth more than any
spreadsheet.

Peter Stasko is a Quality Architect with over 25
years of experience in manufacturing quality systems, process
optimization, and continuous improvement. He writes about the real-world
intersection of quality engineering, organizational behavior, and the
cognitive traps that undermine even the best-intentioned quality
programs.

The Anatomy of a Living Prevention Tool