Quality
FMEA: When Your Organization Stops Being Surprised by Failures and
Starts Predicting Them Before They Happen
You already know what happens next. The line stops. The phone rings.
The customer is furious. Your quality engineer stares at a control plan
that was supposed to prevent this exact thing, and the only question
anyone can think to ask is: Why didn’t we see this coming?
The uncomfortable answer is that you could have. You had the
knowledge. Your engineers knew this failure mode existed. Your operators
had seen it before in a milder form. Your maintenance logs had been
whispering about it for months. But none of that information was ever
organized into a single place, ranked by risk, and turned into a plan
that would have caught it before it reached the customer.
That’s what FMEA does. Not perfectly, not painlessly, and definitely
not without effort — but it does something your organization desperately
needs: it forces you to think about what could go wrong before
it goes wrong, and then it forces you to do something about it.
Failure Mode and Effects Analysis is one of the oldest structured
risk tools in quality engineering. Born in the American aerospace
industry in the 1940s, adopted by NASA in the 1960s, and later embedded
into the automotive core tools framework (alongside APQP, PPAP, MSA, and
SPC), FMEA has been helping organizations anticipate failures for nearly
eighty years. And yet, most organizations still treat it as a paperwork
exercise — a box to check before an audit rather than a living tool that
could prevent the next catastrophe.
Let me show you what happens when you do it right — and what happens
when you don’t.
The Anatomy of a
Preventable Disaster
Picture a mid-size automotive supplier in central Europe. They
manufacture precision-machined housings for transmission systems. One
Monday morning, a customer quality engineer calls to report that 400
housings in a recent shipment have threaded holes that are out of
specification. The threads are too shallow — not by much, but enough
that the bolts won’t seat properly under torque. In a transmission,
that’s not a quality issue. That’s a safety issue.
The investigation takes three weeks. The root cause? A tapping tool
that had been gradually wearing over the last six production runs. The
tool life had been set at 8,000 cycles in the control plan, but nobody
had updated the tool change schedule when the material hardness of the
incoming blanks increased slightly after a supplier change. The worn
tool was producing threads that passed the go/no-go gauge at final
inspection but failed under actual assembly torque at the customer’s
plant.
Here’s what makes this painful: every single piece of information
needed to prevent this failure already existed inside the organization.
The supplier change was documented in the incoming material records. The
material hardness shift was captured in the SPC data. The tool wear
pattern was visible in the dimensional trend charts. The assembly torque
failure was predictable from the thread geometry.
But none of these dots were connected — because nobody had ever done
a proper FMEA on this process.
If they had, the failure mode “thread depth out of specification due
to tool wear accelerated by material hardness variation” would have been
identified, scored with a reasonable severity, occurrence, and detection
rating, and addressed with a concrete action plan — probably an
in-process thread depth check at a frequency that would have caught the
drift before 400 bad parts shipped.
That’s the difference between reactive quality and predictive
quality. That’s the difference between writing a corrective action
report and never needing one.
How FMEA
Actually Works — Beyond the Spreadsheet
Let me strip away the jargon and explain FMEA the way I explain it to
shop floor teams.
You sit down with the people who know the process best — operators,
engineers, maintenance technicians, quality inspectors — and you ask
three questions, over and over, for every step in your process:
- What could go wrong? (Failure Mode)
- What would happen if it did? (Effect)
- What causes it to happen? (Cause)
Then you score each combination on three dimensions:
- Severity (S): How bad would it be? (1 = barely
noticeable, 10 = safety hazard or regulatory violation) - Occurrence (O): How likely is it? (1 = virtually
impossible, 10 = almost certain) - Detection (D): How likely are you to catch it
before it reaches the customer? (1 = almost certain detection, 10 =
virtually undetectable)
Multiply them together: S × O × D = Risk Priority Number
(RPN). This single number tells you where to focus your limited
resources. A failure mode with severity 9, occurrence 4, and detection 8
(RPN = 288) is a far more urgent target than one with severity 3,
occurrence 7, and detection 6 (RPN = 126), even though both deserve
attention.
But here’s where most organizations go wrong: they treat the RPN as
the output instead of the input. The goal isn’t to calculate a number.
The goal is to use that number to decide what you’re going to
do about it.
The Three Types of
FMEA — and When to Use Each
Not all FMEAs are created equal. Understanding which type applies to
your situation is the first step toward doing it effectively.
Design FMEA (DFMEA) is performed on the product
itself. It asks: what could fail in this design, and how would that
affect the customer? DFMEA is most powerful when it’s done early —
before tooling is cut, before materials are sourced, before the design
is locked. A DFMEA that’s done after the design is finalized is an
autopsy, not a diagnosis.
Process FMEA (PFMEA) is performed on the
manufacturing or service process. It asks: what could go wrong during
production, and how would that affect the product? PFMEA is where most
quality practitioners spend their time, and it’s where the biggest gains
are often found — because manufacturing processes are where most
failures actually originate.
System FMEA (SFMEA) looks at the entire system — how
components and subsystems interact. It’s less common in practice but
critically important for complex products where the failure isn’t in any
single part but in the interaction between parts.
The hierarchy matters: you do DFMEA first (because you can’t
manufacture your way out of a bad design), then PFMEA (because you can’t
inspect your way out of a bad process), and the outputs of both feed
into your control plan.
The Five Mistakes That
Kill Every FMEA
I’ve reviewed hundreds of FMEAs across automotive, aerospace, and
pharmaceutical companies. The same mistakes show up everywhere. Here are
the five that do the most damage:
Mistake 1: Doing it alone. An FMEA built by one
engineer sitting at a desk is not an FMEA — it’s an opinion. The power
of the tool comes from the diversity of perspectives around the table.
The operator knows things the engineer doesn’t. The maintenance
technician knows things the operator doesn’t. The supplier quality
engineer knows things nobody in your plant knows. If your FMEA team has
fewer than four people, you’re missing something.
Mistake 2: Treating it as a paperwork exercise. The
moment someone says “We need to finish the FMEA for the audit,” you’ve
already lost. FMEA is not documentation of what you already know — it’s
discovery of what you don’t. If your FMEA sessions feel like filling out
forms instead of having difficult conversations about what could go
wrong, you’re doing it wrong.
Mistake 3: Scoring everything as a 5. I call this
“FMEA golf” — the tendency to rate every failure mode as severity 5,
occurrence 5, detection 5, producing an RPN of 125 for everything. This
makes the entire exercise meaningless. If everything has the same risk,
nothing has priority. The scoring must reflect genuine differences in
risk. Have the arguments. Disagree. That’s where the value lives.
Mistake 4: FMEA as a one-time event. Your process
changes. Your materials change. Your equipment ages. Your people turn
over. If your FMEA doesn’t change with these things, it’s a museum piece
— interesting to look at, useless for preventing failures. A living FMEA
is reviewed and updated at every significant process change, every
customer complaint, every internal nonconformance, and at minimum once
per year.
Mistake 5: Actions without owners and deadlines.
Every high-RPN failure mode needs a specific action plan: who will do
what by when. If your FMEA has a column for “recommended actions” that’s
filled with vague statements like “improve detection” or “monitor more
closely,” you haven’t completed the FMEA — you’ve procrastinated in a
structured format.
The
AIAG-VDA Harmonization: What Changed and Why It Matters
If you’re in automotive, you know that in 2019, AIAG and VDA
published a harmonized FMEA handbook that changed the methodology
significantly. The most important changes:
The RPN was replaced by the Action Priority (AP)
system. Instead of a single numeric score, the new method uses
a matrix of Severity, Occurrence, and Detection to assign one of three
priorities: High (H), Medium (M), or Low (L). This eliminates the old
problem of “our threshold is 100, so we’ll address 105 but ignore 95” —
a distinction that was always mathematically arbitrary.
The FMEA structure became more rigorous. The new
seven-step approach (Scope Definition, Structure Analysis, Function
Analysis, Failure Analysis, Risk Analysis, Optimization, Results
Documentation) forces a more systematic and thorough analysis than the
old fill-in-the-grid approach.
The focus shifted from documentation to action. The
new handbook explicitly states that an FMEA without concrete, completed
actions is incomplete. This is a cultural shift as much as a
methodological one — it signals that the value of FMEA lives in what you
do, not what you write.
Whether you’re using the old AIAG format, the VDA format, or the
harmonized version, the principle remains the same: think about failures
before they happen, rank them by risk, and act on the highest risks
first. The format is the vehicle. The thinking is the destination.
The
FMEA-Control Plan Connection That Most Organizations Miss
Here’s a pattern I see repeatedly: a team does a solid PFMEA,
identifies real failure modes, scores them honestly, and even writes
decent action plans. Then they file it away and build their control plan
from scratch.
This is like hiring a detective to investigate a crime, getting a
detailed report of who did it and how, and then ignoring the report when
you design your security system.
The control plan should be the direct output of the FMEA.
Every high-risk failure mode identified in the FMEA should have a
corresponding control in the control plan — a specific inspection, test,
or prevention mechanism that addresses that exact failure mode. The FMEA
tells you what to worry about. The control plan tells you what to do
about it. They are two halves of the same system.
When I audit organizations, one of the first things I check is
whether I can trace a line from the highest-risk items in the FMEA to
specific controls in the control plan. In most places, that traceability
doesn’t exist. The FMEA was done by one team, the control plan by
another, and the only thing they share is a file folder on a shared
drive.
When FMEA Becomes a
Competitive Advantage
Let me tell you about a company that gets it right. A German
precision components manufacturer I worked with had a rule: no new
product launch without a completed, reviewed, and approved FMEA. Not a
formality — a genuine gate. If the FMEA was incomplete or inadequate,
the launch didn’t happen. Period.
The result? Their customer warranty claims dropped by 62% over three
years. Their launch success rate (defined as zero customer quality
issues in the first 90 days of production) went from 73% to 94%. Their
quality engineers spent 80% less time firefighting and 80% more time
preventing.
But the real advantage wasn’t the numbers. It was the culture.
Engineers at this company didn’t view FMEA as a burden — they viewed it
as a thinking tool that helped them design better products and
processes. Operators asked to participate in FMEA sessions because
they’d seen firsthand how the improvements that came from those sessions
made their jobs easier. Maintenance technicians came voluntarily because
they knew the resulting preventive maintenance schedules would reduce
breakdowns.
When FMEA becomes part of how your organization thinks — not just
what it documents — you’ve achieved something most companies never will.
You’ve built the ability to predict failures into your organizational
DNA.
The Practical Guide to
Doing FMEA Right
If you’re reading this and thinking “we need to do better,” here’s
where to start:
Week 1: Pick one process. Don’t try to FMEA
everything at once. Pick your highest-risk, highest-volume, or
most-complained-about process and start there. The lessons you learn
will make every subsequent FMEA faster and better.
Week 2: Build the right team. You need the process
engineer, the operator, the maintenance technician, the quality
inspector, and the supplier quality engineer (if applicable). Block four
hours. Provide coffee. Ban laptops except for the person running the
session.
Week 3: Do the analysis. Walk through every step of
the process. For each step, identify failure modes, effects, causes, and
current controls. Score severity, occurrence, and detection honestly.
Calculate the RPN or assign the Action Priority. Argue. Debate.
Disagree. That’s the sound of risk being identified.
Week 4: Take action. For every high-risk item,
define a specific action: a design change, a process change, a new
inspection method, a training update. Assign an owner and a deadline.
Put it in the project tracker. Review it weekly.
Ongoing: Keep it alive. Review the FMEA at every
process change, every customer complaint, every audit finding. Update
the scores. Close the actions. Add new failure modes as you discover
them. Make it a living document, not a historical artifact.
The Real ROI of FMEA
Organizations often ask me to justify the time investment in FMEA.
The math is straightforward.
A typical PFMEA for a moderately complex manufacturing process takes
16-40 hours of team time. Let’s call it 30 hours across five people —
roughly 150 person-hours. At a fully loaded cost of $75 per hour, that’s
an $11,250 investment.
Now consider the cost of a single quality escape that reaches the
customer: containment, sorting, replacement parts, expedited shipping,
customer line stoppage penalties, 8D investigation time, corrective
action implementation, and the intangible cost of damaged customer
confidence. For an automotive supplier, a single significant escape
typically costs between $50,000 and $500,000.
The ROI isn’t even close. One prevented failure pays for years of
FMEA work. And that’s before you count the internal savings: fewer
rejects, less rework, higher first-pass yield, shorter startup times on
new products, and a quality engineering team that spends its time
improving the future instead of apologizing for the past.
What I’ve Learned After
25 Years of FMEA
The organizations that get the most from FMEA share three traits:
First, they respect the process enough to do it
thoroughly but not so reverently that they lose sight of the
purpose. FMEA is a means to an end, not an end in itself. A good-enough
FMEA that drives real action beats a perfect FMEA that sits in a binder
every time.
Second, they involve the right people — not just the
people with the right titles, but the people with the right knowledge.
The operator who’s been running that machine for fifteen years knows
more about how it fails than the engineer who designed the process. The
maintenance technician who’s been replacing that bearing every three
months knows more about the failure pattern than the SPC chart that was
set up to monitor it.
Third, they close the loop. Every action gets
tracked to completion. Every completed action gets verified for
effectiveness. Every verification feeds back into the FMEA as an updated
detection or occurrence score. The wheel turns continuously, and each
revolution makes the process more robust.
FMEA won’t prevent every failure. No tool will. But it will prevent
the failures you should have seen coming — the ones that were
predictable, whose causes were known, whose effects were foreseeable,
and whose prevention was affordable. And in a world where a single
quality escape can cost more than an entire year’s quality budget,
preventing the predictable isn’t just good practice. It’s the difference
between organizations that survive and organizations that thrive.
Your next quality crisis is already forming somewhere in your
process. The question isn’t whether you’ll face it — it’s whether you’ll
see it coming.
Peter Stasko is a Quality Architect with 25+ years
of experience transforming organizations across automotive, aerospace,
and pharmaceutical industries. He specializes in building quality
systems that don’t just comply with standards — they prevent the
failures that standards were written to address. His approach combines
deep technical expertise in core quality tools with an understanding of
the organizational psychology that determines whether those tools
actually get used.