Quality and Survivorship Bias: When Your Organization Studies Only Its Successes and Repeats the Failures It Never Examined — and the Lessons You Drew From What Survived Became the Blind Spots That Sank What Didn’t

Blog

During World War II, the Allied command faced a critical problem.
Bombers were returning from missions riddled with bullet holes, and the
military wanted to reinforce the aircraft to improve survival rates.
Engineers examined the returning planes and mapped the pattern of
damage. The data was clear: the fuselage, outer wings, and tail section
showed the heaviest concentration of hits.

The obvious recommendation was to add armor to those areas.

A mathematician named Abraham Wald stopped them. His insight was
deceptively simple and profoundly counterintuitive: the planes they were
examining had survived. The bullet holes they saw were in the places a
plane could be hit and still fly home. The planes that were hit in the
engines or the cockpit never came back. The armor, Wald argued, should
go where the surviving planes showed no damage at all — because those
were the hits that were fatal.

This is survivorship bias, and it is one of the most dangerous
cognitive traps in manufacturing quality. You study what succeeded. You
ignore what failed. And the strategies you develop from your survivors
become the very strategies that guarantee more failures you will never
examine.

The Anatomy of an Invisible
Problem

Survivorship bias distorts quality management in ways that are often
invisible precisely because the evidence of the distortion is missing.
In a manufacturing environment, the “survivors” are the products that
passed inspection, the production runs that met their targets, the
suppliers that delivered on time, and the process changes that seemed to
work. The “casualties” are the scrapped parts, the failed runs, the
rejected suppliers, and the abandoned initiatives.

The problem arises when you build your understanding of quality
exclusively from the survivors.

Consider a plant that has implemented a new machining process. After
six months, the defect rate has dropped, throughput has increased, and
the leadership team is preparing a case study for the rest of the
organization. They document the steps they took, the parameters they
selected, and the training they provided. Other plants are expected to
replicate the approach.

What the case study does not capture are the three months of aborted
attempts before the process stabilized, the two operators who were
reassigned because they could not adapt, the batch of material that was
quietly scrapped at a cost nobody reported to finance, and the specific
machine conditions — ambient temperature, tool wear state, coolant
concentration — that happened to align perfectly during the successful
runs but were never controlled as formal process parameters.

The case study is the returning bomber. The failures are the ones
that never came back.

Where
Survivorship Bias Hides in Quality Systems

Process Optimization

When engineers optimize a process, they typically study the runs that
produced conforming parts and adjust parameters to replicate those
conditions. But if a significant number of runs produced scrap and were
discarded from the analysis — perhaps because the data was never
captured, or because the runs were attributed to “operator error” and
excluded — then the optimization is based on an incomplete picture.

The result is a process window that looks robust but has hidden
vulnerabilities. The parameters that caused failures are not understood
because they were never studied. When those conditions reappear —
perhaps under a different shift, with a different material lot, or
during a different season — the failures return, and nobody can explain
why.

The engineers look at their control charts, see that the process is
“in control,” and conclude that the failure is an anomaly. But the
anomaly was always there. It was just excluded from the dataset.

Supplier Selection

Organizations evaluate suppliers based on the ones they kept, not the
ones they rejected. A supplier that has been on the approved list for
five years is assumed to be good because it has been supplying
conforming material for five years. But what about the three suppliers
that were rejected during the same period? What were their failure
modes? Were they rejected for legitimate quality reasons, or were they
rejected because their documentation format was unfamiliar, because
their sales representative was less polished, or because their pricing
structure did not match the procurement team’s expectations?

If you never study the rejected suppliers, you cannot know whether
your selection criteria are actually predicting quality performance or
merely correlating with organizational comfort.

More importantly, you cannot know whether one of the rejected
suppliers might have offered a genuinely superior solution to a problem
you have been unable to solve with your current supply base.

Root Cause Analysis

Root cause analysis is particularly vulnerable to survivorship bias
when it is applied only to the defects that escape to the customer.
Internal defects that are caught and reworked are often treated as less
significant, even though they may represent the same underlying cause at
an earlier stage of the process.

An organization that investigates only customer complaints and field
failures is studying the bombers that made it back with damage. The
defects that were caught internally — the ones that never reached the
customer — may represent different failure modes entirely, and they may
be more frequent and more revealing than the escapes.

A comprehensive root cause analysis program must examine all defects,
not just the ones that survived the detection system.

Benchmarking and Best
Practices

Benchmarking is built on survivorship bias. You visit the plant that
has the best quality metrics in the industry, you document their
practices, and you attempt to implement them at your own facility. What
you do not see are the plants that tried the same practices and failed —
the ones that invested in the same automation, adopted the same
organizational structure, and implemented the same quality management
system, but did not achieve the same results.

You also do not see the specific conditions at the benchmark plant
that enabled its success: a workforce with unusually low turnover, a
customer base that provides stable demand, a product design that is
inherently forgiving of process variation, or a regulatory environment
that imposes requirements that happen to align with the practices being
benchmarked.

The benchmark plant is a survivor. Studying it without understanding
why other implementations failed is like studying the bullet holes on
the returning bombers.

The Cost of
Drawing Lessons From Survivors Only

The financial impact of survivorship bias in quality management is
substantial but difficult to quantify, precisely because the costs are
incurred by the failures that are never examined.

Failed process improvements consume engineering time, operator
training hours, material costs, and production capacity. When these
failures are dismissed as implementation problems rather than analyzed
as data, the organization repeats the same mistakes. It invests in the
next improvement initiative with the same flawed assumptions, encounters
the same hidden obstacles, and fails in the same ways — each time
attributing the failure to execution rather than understanding.

Supplier failures that result in line stoppages, premium freight, or
emergency sourcing represent direct costs that often exceed the savings
from the original sourcing decision. But if the organization studies
only its successful supplier relationships, it cannot identify the
selection criteria that predict failure, and it cannot improve its
sourcing process.

The most insidious cost, however, is the false confidence that
survivorship bias creates. Organizations that study only their successes
develop an inflated sense of their own competence. They believe they
understand their processes, their suppliers, and their markets better
than they actually do. This overconfidence leads to riskier decisions,
larger bets on unproven approaches, and a reduced investment in the
detection and prevention systems that might catch the failures they are
not anticipating.

A
Manufacturing Case Study: The Process That Could Not Fail

A medical device manufacturer had a molding process that had produced
conforming parts for over two years. The process was considered
validated and stable. During a routine revalidation triggered by a minor
equipment change, the process suddenly began producing parts with
dimensional nonconformances that had never been seen before.

The investigation revealed that the original validation had been
conducted during a period when the ambient humidity in the facility
happened to be unusually low. The process parameters that produced
conforming parts during validation were effective only within a narrow
humidity range that the facility had since exited. The two years of
conforming production had occurred during a period of favorable
environmental conditions, not because the process was inherently
robust.

The original validation was a survivor. It succeeded under specific
conditions that were not understood and not controlled. The failure,
when it finally occurred, was catastrophic: production was halted for
six weeks while the process was revalidated with proper environmental
controls.

The cost of the shutdown was significant. The cost of the original
oversight — the failure to study the conditions that might have caused
the process to fail during validation — was immeasurable, because those
conditions had never been tested.

How to
Counter Survivorship Bias in Quality Management

Study Your Failures
Systematically

Create a structured process for analyzing every significant failure,
not just the ones that reach the customer. Maintain a failure database
that captures the conditions, the root cause, the corrective action, and
— critically — the cost. Review this database regularly, not just when a
new failure occurs.

Include near-misses. A near-miss is a failure that almost happened
but was caught by the detection system. Near-misses are the bullet holes
in the fuselage: they show you where the process is vulnerable, and they
are far more common than actual failures.

Include Rejected
Alternatives in Your Analysis

When you select a process parameter, a supplier, a technology, or an
organizational approach, document the alternatives that were rejected
and the reasons for their rejection. Periodically review these rejected
alternatives to determine whether the rejection reasons were based on
data or on assumptions that may no longer be valid.

This practice is analogous to conducting a pre-mortem: before
implementing a decision, ask what would cause it to fail, and then check
whether those failure modes are represented in your rejected
alternatives.

Examine
the Conditions of Success, Not Just the Outcomes

When a process succeeds, a supplier delivers, or a project completes
on time, do not simply celebrate and move on. Investigate the conditions
that enabled the success. Were they controlled? Are they sustainable?
Could they change without warning?

A process that produces conforming parts because the ambient
temperature happened to be within a favorable range is not a robust
process. It is a lucky process. Understanding the conditions of success
allows you to control those conditions and make the success
reproducible.

Seek Out the Failures
You Are Not Seeing

Actively look for failures that your current systems might be
missing. Conduct internal audits that specifically look for process
conditions that are not being monitored, failure modes that are not
being tracked, and data that is not being captured.

Talk to operators, technicians, and engineers about the problems they
see that never make it into the formal reporting system. These informal
observations are often the most valuable data you have, because they
represent the failures that your formal systems are designed to
overlook.

Test Your Assumptions
About What Works

Design experiments that deliberately test the boundaries of your
processes. Run process parameters at the edges of the established
window, not just at the center. Test materials from alternative
suppliers, not just from the approved list. Challenge the assumptions
that underpin your quality system, not because they are necessarily
wrong, but because untested assumptions are indistinguishable from
survivorship bias.

Abraham Wald did not have data on the planes that did not return. But
he had the intellectual honesty to recognize that the absence of data
was itself the most important data point. In quality management, the
failures you do not see are more dangerous than the ones you do —
because the ones you see, you can fix.

The Uncomfortable Truth

Every quality system is built on a foundation of assumptions about
what works and what does not. Those assumptions are derived from
experience — but experience is inherently filtered through survival. The
processes, suppliers, and practices that have survived in your
organization are not necessarily the best ones. They are the ones that
happened to work under the specific conditions your organization has
encountered.

If those conditions change — and they will — the survivors may fail.
And if you have studied only the survivors, you will not understand
why.

The organizations that achieve the highest quality are not the ones
that study their successes most carefully. They are the ones that study
their failures most honestly — including the failures they never saw
coming, the failures they dismissed as anomalies, and the failures that
are still hidden in the blind spots created by the very systems they
trust to protect them.

Wald’s insight saved bombers. The same insight can save your quality
system. Stop looking only at where the bullet holes are. Start asking
where they are not — and why.


Peter Stasko is a Quality Architect with over 25
years of experience in manufacturing quality systems, process
optimization, and organizational behavior. He writes about the
intersection of human psychology and industrial quality at
iaec.online.

Scroll top