Quality FMEA: When Your Organization Stops Being Surprised by Failures and Starts Predicting Them — and the Risks You Imagined Before They Happened Became the Problems You Never Had to Face

Uncategorized

Quality
FMEA: When Your Organization Stops Being Surprised by Failures and
Starts Predicting Them — and the Risks You Imagined Before They Happened
Became the Problems You Never Had to Face

The Failure Nobody Saw
Coming

In 2018, a Tier 1 automotive supplier in Slovakia shipped 40,000 fuel
injector assemblies to a major German OEM. Every single one passed final
inspection. The dimensional data was perfect. The functional test
results were flawless. The PPAP documentation was impeccable.

Fourteen months later, three cars caught fire on the Autobahn.

The root cause was a microscopic crack that formed during assembly
when a technician pressed a retaining clip at a specific angle — an
angle that only occurred on the night shift when a particular fixture
had worn past its maintenance interval. The crack propagated under
thermal cycling until fuel vapor escaped and found an ignition
source.

The supplier had performed a process FMEA. They had identified 47
failure modes. They had ranked risks using severity, occurrence, and
detection scores. They had a cross-functional team and a facilitator and
a spreadsheet that took three weeks to complete.

They missed the one that mattered.

Not because they were incompetent. Not because they cut corners. They
missed it because they were human — and FMEA, when done mechanically, is
one of the most elaborate ways a competent team can convince itself it
has thought of everything while systematically missing the thing that
will actually hurt them.

This is the story of how to do FMEA the way it was meant to be done —
not as a paperwork exercise, not as a compliance checkbox, but as the
most powerful risk-thinking tool your organization will ever use. If
you’re willing to actually use it.


What FMEA Actually Is —
and What It Isn’t

Failure Mode and Effects Analysis is, at its core, a structured
conversation about what could go wrong.

That’s it. That’s the whole thing.

Every other definition — the risk priority numbers, the
severity-occurrence-detection matrices, the formal linkage to APQP and
PPAP, the AIAG-VDA harmonized handbook with its 168 pages of guidance —
all of that is scaffolding. Useful scaffolding, sometimes necessary
scaffolding, but scaffolding nonetheless. The building underneath is a
conversation about failure.

The problem is that most organizations treat the scaffolding as the
building.

They sit in a conference room with a facilitator who opens a
spreadsheet template, goes down a list of process steps, and asks the
team to brainstorm failure modes for each one. The team, exhausted from
production, throws out the obvious risks. The facilitator fills in
severity, occurrence, and detection scores. They multiply them together
to get a Risk Priority Number. They sort by RPN. They address the top
five. They close the FMEA. They file it. They never look at it
again.

This is not FMEA. This is risk theater.

Real FMEA is what happens when a team of people who deeply understand
the process sit together and ask, with genuine curiosity and without
blame: “What are all the ways this could fail? What would happen if it
did? How likely is that? Would we even know before it reached the
customer? And what are we going to do about the answers we don’t
like?”

The spreadsheet is just a way to capture that conversation so it
doesn’t evaporate.


The Three Faces of FMEA

FMEA is not one tool. It’s three tools wearing the same hat.

System FMEA looks at the product as a whole. What
are the system-level functions, and how could each one fail? This is
where you catch the architecture problems — the interfaces between
subsystems that create vulnerabilities neither subsystem owner would
think to check. System FMEA is done early, during concept and design,
when changing your mind is cheap.

Design FMEA zooms into the product design itself.
For every component, every material choice, every tolerance, every
interface: what could go wrong, and what would the consequences be?
Design FMEA is where you discover that the seal material you specified
degrades at a temperature that’s within the operating range of the
component sitting next to it. It’s where you realize the tolerance
stack-up creates a gap that only appears when three dimensions are
simultaneously at their worst-case limits — a condition that’s
statistically rare in your samples but mathematically certain across a
million units.

Process FMEA examines the manufacturing process. For
every operation, every handling step, every transfer point: what could
go wrong, and how would the product be affected? Process FMEA is where
you catch the worn fixture, the ambiguous work instruction, the gauge
that can’t actually detect the defect you’re worried about.

All three follow the same logic. All three ask the same questions.
All three fail in the same ways when people go through the motions
instead of doing the thinking.


The Anatomy of a Real FMEA

Let me walk you through how a real FMEA works — not the textbook
version, but the version that actually prevents failures.

Step 1: Define
the Scope With Surgical Precision

“Let’s do an FMEA on the fuel injector assembly” is too broad. You’ll
drown in failure modes and produce a document so large nobody will ever
read it.

Instead: “We’re doing an FMEA on the retaining clip installation
operation, including clip feeding from the bowl feeder through final
seating verification.” Now you have a scope tight enough to be thorough
but bounded enough to be manageable.

The scope decision is itself a risk judgment. Scope too narrow, and
you miss interface failures. Scope too broad, and you exhaust your team
before you reach the critical risks.

Step 2: Map the Process or
Structure

Before you brainstorm failure modes, you need to understand what
you’re analyzing. For a process FMEA, that means a detailed process flow
diagram — not the high-level one from your control plan, but the real
one that shows every handling step, every transfer, every wait queue,
every rework loop.

For a design FMEA, it means a block diagram or boundary diagram that
shows every component, every interface, and every energy, material, and
signal flow between them.

This step alone often reveals risks. When the process engineer draws
the flow diagram and the operator says, “That’s not how we actually do
it,” you’ve just found a gap between procedure and practice — a gap that
is itself a failure mode.

Step
3: Brainstorm Failure Modes — and This Is Where Most Teams Fail

A failure mode is simply a way the process step or design element
could fail to deliver its intended function. Not the cause. Not the
effect. The failure itself.

“Retaining clip not fully seated.” That’s a failure mode.

“Clip jams in the bowl feeder” is a cause. “Fuel leak at the injector
body” is an effect. Mixing these up is the single most common FMEA
error, and it cascades through everything that follows.

The brainstorming is where the magic lives or dies. Mechanical FMEA
looks at each step and asks, “What could go wrong here?” Real FMEA asks
a much richer set of questions:

  • What could go wrong during normal operation?
  • What could go wrong during setup or changeover?
  • What could go wrong if the operator is tired, new, distracted, or
    working from memory instead of the instruction?
  • What could go wrong if incoming material is at the edge of its
    specification?
  • What could go wrong if the environment changes — temperature,
    humidity, vibration?
  • What could go wrong during maintenance or after maintenance?
  • What could go wrong that we wouldn’t detect with our current
    controls?
  • What has gone wrong before — here, or at similar operations in other
    plants?
  • What hasn’t gone wrong yet but is statistically inevitable at our
    volume?

This last question is the one that separates good FMEA from great
FMEA. If you make a million parts and a particular failure has a
one-in-a-million chance per part, you will see it approximately once.
Have you prepared for that once?

Step 4: Assess
Severity — The Voice of the Customer

Severity is the rating of how badly it would go if the failure mode
occurred and reached the customer. Not how likely it is. How bad it
would be.

This is where FMEA connects to your customer’s world. A severity 10
means potential for injury or death, or noncompliance with government
regulations. A severity 5 means the product still functions but with
noticeable degradation. A severity 1 means the effect is so minor the
customer would never notice.

The critical insight: severity is the one rating you can’t engineer
away. You can reduce occurrence with better process design. You can
improve detection with better inspection. But if a retaining clip fails
and fuel vapor escapes into a hot engine bay, the severity of that
outcome doesn’t change regardless of how unlikely it is.

This is why the AIAG-VDA approach emphasizes severity first.
High-severity failure modes deserve attention regardless of their
likelihood. The supplier who missed the cracked retaining clip had given
it a severity of 8 — serious but not catastrophic — because they hadn’t
thought through the full consequence chain all the way to the end
customer. They stopped at “clip not fully seated → potential leak at
injector body” and never asked, “What happens when that leaked fuel
vapor finds an ignition source in a moving vehicle?”

Step 5:
Assess Occurrence — The Voice of the Process

Occurrence is how likely the failure mode is to happen, given your
current process design and controls.

This is where process knowledge matters more than any spreadsheet.
The engineer who has been running this line for five years has a
calibrated intuition for occurrence that no scoring guide can replicate.
Listen to that person.

The trap: teams anchor on historical data. “We’ve never seen that
failure” gets translated to an occurrence rating of 1. But “we’ve never
seen it” and “it can’t happen” are very different statements. If you’ve
made 50,000 units and never seen a particular failure, that tells you
the occurrence rate is probably below 1 in 50,000. It tells you nothing
about whether it’s 1 in 100,000 — which means you’ll see it five times
in the next half-million units.

Step 6:
Assess Detection — The Voice of Your Controls

Detection is the rating of how likely your current controls are to
catch the failure mode before it reaches the customer.

This is where most organizations are dishonest with themselves. They
list their controls — final inspection, SPC charts, visual checks — and
give themselves optimistic detection scores. But detection needs to be
evaluated for the specific failure mode, not for the general quality of
your inspection system.

“Can your current controls detect a microscopic crack in a retaining
clip that only appears when the fixture is worn past its maintenance
interval and the operator presses at a specific angle during the night
shift?”

If your answer is “we do a visual inspection at final,” then your
detection score should reflect that a microscopic crack is probably not
visible to the naked eye during a high-volume final inspection. Your
detection score should be high (meaning poor detection), not because
your inspection is bad, but because your inspection was never designed
to catch this specific failure.

Step 7: Prioritize and Act

The AIAG-VDA harmonized approach uses Action Priority (High, Medium,
Low) rather than the traditional RPN multiplication. This is an
improvement. RPN had the mathematical illusion of precision — a severity
8 × occurrence 2 × detection 2 gave the same 32 as severity 2 ×
occurrence 8 × detection 2, even though the first scenario (rare but
catastrophic) demands a completely different response than the second
(frequent but minor).

Action Priority tables force you to look at the combination of
ratings and make a judgment call. High severity with any meaningful
occurrence or poor detection? High priority. Period.

Then comes the hardest part: actually doing something about it.


The Action Plan: Where
FMEA Lives or Dies

An FMEA without actions is a diary of risks you chose to accept. It’s
a document that says, “We knew this could fail, and we decided not to
prevent it.”

Every high-priority and medium-priority failure mode needs an action
plan with three components:

  1. What you’re going to do — Be specific. “Improve
    process” is not an action. “Add a proximity sensor that verifies clip
    seating depth within ±0.2mm before the assembly advances to the next
    station” is an action.

  2. Who is responsible — A name. Not a department. A
    specific human being who will be accountable for making this
    happen.

  3. When it will be done — A date. Not “Q3.” May 31,
    2026.

And then — this is the part most organizations skip — you come back
after the action is implemented and re-evaluate the severity,
occurrence, and detection ratings. Did the action actually reduce the
risk? If you added a sensor, what’s the new detection rating? If you
redesigned the fixture, what’s the new occurrence rating?

The re-evaluation is the proof that your FMEA was real. If the
ratings didn’t change, the action didn’t work.


The Five FMEA
Sins That Kill Quality Programs

After twenty-five years of reviewing FMEAs across automotive,
aerospace, and pharmaceutical companies, I’ve seen the same patterns
repeat. Here are the five sins that transform a powerful tool into
expensive shelfware.

Sin 1: Doing FMEA Alone. The design engineer fills
out the design FMEA. The process engineer fills out the process FMEA.
Nobody talks to operations, maintenance, or the customer. The result is
an FMEA that reflects one person’s blind spots instead of the team’s
collective insight.

Sin 2: Copy-Paste FMEA. Last year’s FMEA becomes
this year’s FMEA with a new date in the header. The failure modes are
the same. The ratings are the same. The actions are the same. Nothing
was learned. Nothing changed. The document grew older but not wiser.

Sin 3: The RPN Threshold Trap. “We only address
failure modes with RPN above 120.” This arbitrary threshold means a
severity 10 failure mode with occurrence 3 and detection 3 (RPN 90) gets
ignored while a severity 4 occurrence 6 detection 5 (RPN 120) gets
attention. You are protecting your spreadsheet instead of protecting
your customer.

Sin 4: FMEA After the Fact. The design is frozen.
The tooling is built. Production has started. Now someone remembers the
FMEA requirement and scrambles to produce a document. This is not risk
analysis. This is retroactive justification. The whole point of FMEA is
to influence decisions before they’re made, not to document decisions
after they’re locked in.

Sin 5: The File-and-Forget. The FMEA is completed,
signed off, and filed in the quality management system. It is never
updated when the process changes, when new failure modes are discovered
in the field, or when lessons are learned from warranty data. A living
document becomes a dead artifact.


The FMEA That
Actually Prevented a Failure

Let me end with a story that illustrates what FMEA looks like when it
works.

A medical device manufacturer was developing an auto-injector for a
critical cardiac medication. During the design FMEA, a reliability
engineer asked a question that made the room uncomfortable: “What
happens if the patient stores the device in a car in Phoenix in
August?”

The team had tested the device at standard conditions. They had
tested at elevated temperatures — 40°C, per the standard. But nobody had
tested at 70°C, which is the interior temperature of a parked car in the
Arizona sun in summer.

The material scientist in the room went quiet. Then she said, “The
spring constant of the actuation spring changes by approximately 15% at
that temperature. The injection force profile would shift.”

They ran the test. At 70°C, the device delivered the medication — but
0.3 seconds slower than specification. In a cardiac emergency, 0.3
seconds matters.

The design was modified. A different spring material was selected. A
thermal shield was added to the housing. The final product worked
correctly from -20°C to 80°C.

The cost of the design change, made during the FMEA: approximately
$40,000 in engineering time and tooling modifications.

The cost of a field failure involving a cardiac medication:
incalculable.

That $40,000 was the cheapest money the company ever spent. And it
was spent because one person in an FMEA session asked a question that
everyone else had assumed was already answered.


Making FMEA a Living
Practice

The organizations that get the most value from FMEA don’t treat it as
a milestone. They treat it as a practice — something that evolves
continuously as understanding deepens.

Here’s how they do it:

They update the FMEA when the process changes. Every
engineering change, every new supplier, every equipment modification
triggers a review of the relevant FMEA sections.

They update the FMEA when failures occur. Every
customer complaint, every internal nonconformance, every near-miss is
checked against the FMEA. Was this failure mode already identified? If
yes, were the ratings accurate? If no, why was it missed, and what does
that teach us about our risk thinking?

They connect FMEA to control plans. The control plan
is not a separate document — it’s the operational expression of the
FMEA. Every high-severity or high-priority failure mode in the FMEA
should have a corresponding control in the control plan. If it doesn’t,
you have a gap. If the control plan has controls for risks not in the
FMEA, you have an undocumented risk assessment.

They use FMEA in management reviews. Not as a status
update (“FMEA completed, 47 failure modes identified, 12 actions
implemented”), but as a strategic risk discussion. “What are our
highest-severity risks? What are we doing about them? Are our controls
adequate? What new risks have emerged since last quarter?”

They train people to think in failure modes. Not
just engineers. Operators, maintenance technicians, material handlers —
anyone who touches the process. When an operator says, “I’ve noticed the
fixture feels different when I load parts on the night shift,” that’s a
potential failure mode. The operator who has been trained to recognize
and report that observation is worth more than any spreadsheet.


The Uncomfortable Truth
About FMEA

FMEA will never catch everything. No tool will. The world is too
complex, the interactions too nonlinear, the unknown unknowns too
numerous.

But FMEA will catch more than not doing FMEA. And doing FMEA well
will catch more than doing it mechanically. And doing FMEA as a living
practice — continuously updated, connected to real failures, integrated
into decision-making — will catch more than any one-shot exercise.

The supplier who missed the cracked retaining clip? After the
incident, they went back and redid the FMEA. Not to assign blame, but to
understand what they’d missed and why. They found that the failure mode
had been partially identified in an earlier draft — a maintenance
engineer had flagged “fixture wear affecting clip seating force” — but
it had been removed during the review because the team decided the
maintenance schedule was “adequate to prevent that.”

The maintenance schedule was adequate — for normal wear. It wasn’t
adequate for the accelerated wear caused by a specific batch of clips
that had slightly burred edges, creating friction that wore the fixture
faster than expected. Two interacting causes, each individually
controlled, together creating a condition nobody had anticipated.

This is the level of thinking FMEA demands. Not just “what could go
wrong,” but “what could go wrong together.” Not just “what’s our
control,” but “could our control fail too?” Not just “what’s the risk,”
but “are we sure we’re asking the right question?”

FMEA doesn’t guarantee perfection. It guarantees that you thought
about it. And in quality, the act of thinking — genuinely, rigorously,
honestly — is the most powerful tool you have.


Peter Stasko is a Quality Architect with 25+ years of experience
transforming organizations across automotive, aerospace, and
pharmaceutical industries. He has led FMEA programs that have prevented
failures ranging from fuel system defects to medical device
malfunctions, and he has never once met a copy-paste FMEA he didn’t want
to throw across the room.

Scroll top