Quality MSA: When Your Organization Discovers That Every Decision It Ever Made Was Built on Measurements Nobody Ever Validated — and the Data Everyone Trusted Became the Illusion Everyone Followed

Uncategorized

Quality
MSA: When Your Organization Discovers That Every Decision It Ever Made
Was Built on Measurements Nobody Ever Validated — and the Data Everyone
Trusted Became the Illusion Everyone Followed

The Measurement Nobody
Questioned

There is a moment in every quality professional’s career when they
realize something uncomfortable: the numbers they’ve been trusting might
be lying. Not because someone is dishonest. Not because the instruments
are broken. But because the entire measurement system — the gauge, the
operator, the method, the environment, the interaction between all of
these — was never validated as capable of telling the truth in the first
place.

I remember standing in a CNC machining facility that produced
precision shafts for a Tier 1 automotive supplier. The line had been
running for eighteen months. Every single shaft had been measured, every
dimension recorded, every control chart maintained with religious
discipline. SPC was the religion, and the operators were devout.

Then the customer sent back 12,000 shafts.

The investigation took three weeks. The root cause took thirty
seconds once someone finally asked the right question: Has anyone
ever done an MSA on the measurement system?

No one had. The gauges were calibrated — the certificates were
current and framed on the wall. But calibration and measurement system
analysis are not the same thing. Calibration tells you the instrument
reads correctly under controlled laboratory conditions. MSA tells you
whether the instrument, in the hands of your operators, in your
environment, measuring your parts, produces results you can trust.

It doesn’t matter how precisely calibrated your gauge is if the
operator holding it applies different pressure each time. It doesn’t
matter how accurate your coordinate measuring machine is if the fixture
locates the part differently on every run. And it doesn’t matter how
many control charts you maintain if the variation you’re plotting is
measurement noise, not process variation.

The shafts were fine. The measurement system was the problem. The
customer had used a different method, one that was actually capable, and
found what the supplier’s system had been hiding in plain sight: a
measurement process so inconsistent that it couldn’t distinguish between
a good part and a bad one.

Twelve thousand shafts scrapped. Not because the process failed.
Because the measurement system was never validated.

What
MSA Actually Measures — And Why Most Organizations Get It Wrong

Measurement Systems Analysis is a structured methodology for
evaluating the amount of variation introduced by the measurement process
itself. It answers a deceptively simple question: Can your
measurement system reliably distinguish between parts that are
different?

Most organizations conflate MSA with calibration. They are related
but fundamentally different:

  • Calibration verifies that an instrument’s readings
    align with known reference standards. It answers: Is the gauge
    accurate?
  • MSA evaluates the entire measurement system —
    instrument, appraiser, method, environment, and their interactions. It
    answers: Is the measurement system capable?

You can have a perfectly calibrated gauge that produces useless
measurements. The calibration certificate confirms the gauge reads a
standard correctly. MSA confirms the gauge reads your parts
correctly in your shop with your people. These are
completely different questions.

The IATF 16949 standard requires MSA for all measurement systems
referenced in the control plan. Most automotive suppliers comply —
technically. They run the studies, file the reports, and present them
during audits. But compliance and competence are not the same. Many
organizations treat MSA as a paperwork exercise rather than what it
actually is: the foundation of every data-driven decision the
organization makes.

If your measurement system cannot reliably detect the variation in
your process, then:

  • Your SPC charts are noise, not signal
  • Your capability indices (Cp, Cpk) are fiction
  • Your scrap decisions are gambling, not quality control
  • Your sorting operations are Theater of Quality — activity that looks
    decisive but accomplishes nothing reliable
  • Your supplier scorecards are measuring measurement error, not
    supplier performance

The Five Sources of
Measurement Variation

Every measurement result contains both the true part variation and
the variation introduced by the measurement process. MSA decomposes this
into five components:

1. Bias

Bias is the difference between the observed average measurement and
the true reference value. If your gauge consistently reads 0.02 mm
higher than the actual dimension, that’s bias. It’s systematic error —
predictable and correctable, but only if you know it exists.

Bias is the easiest component to detect and fix. A simple linearity
study comparing measurements against reference values across the
measurement range reveals it immediately. Yet many organizations never
perform this study because their calibration certificate says “passed,”
and they assume that’s sufficient.

2. Repeatability (Equipment
Variation)

Repeatability is the variation in measurements taken by the same
operator, using the same instrument, on the same part, over a short
period. It answers: How consistent is the gauge itself?

Low repeatability means the instrument produces different readings
when measuring the same thing repeatedly. This is the most fundamental
source of measurement error. If your gauge can’t agree with itself,
nothing downstream matters.

Common causes include instrument wear, inadequate resolution, poor
fixture design, thermal instability, and insufficient measurement force
control. I once saw a micrometer that produced different readings
depending on which side of the anvil the part was placed — a wear
pattern so subtle that no calibration lab would catch it, but so
significant that it made every measurement unreliable.

3. Reproducibility
(Appraiser Variation)

Reproducibility is the variation in measurements taken by different
operators using the same instrument on the same parts. It answers:
How much does the person holding the gauge matter?

In many organizations, reproducibility is the dominant source of
measurement error. Different operators interpret measurement methods
differently. They apply different pressures. They position parts in
fixtures differently. They read analog scales from different angles.
They round differently. They follow the same procedure but execute it
with subtle variations that compound into significant disagreement.

I’ve witnessed Gage R&R studies where the reproducibility
variation was three times larger than the repeatability variation. The
instrument was excellent. The operators were skilled. But the method was
ambiguous enough that each operator developed their own interpretation,
and no one had ever measured the disagreement because no one had ever
done an MSA.

4. Stability

Stability is the change in measurement system performance over time.
A gauge that was accurate last year may have drifted. An operator who
was consistent during training may have developed shortcuts. A method
that worked in winter may produce different results in summer because
the shop temperature affects both the instrument and the part.

Stability is the most neglected component of MSA because it requires
ongoing monitoring, not a one-time study. Most organizations perform
their initial MSA, file the report, and never think about it again — as
if measurement systems are static objects immune to the passage of time,
wear, and organizational change.

5. Linearity

Linearity is the change in bias across the measurement range. A gauge
might be perfectly accurate at 25.000 mm but biased by 0.01 mm at 25.500
mm. If you only validate bias at one reference point, you’ll never
discover that your measurements become less trustworthy as the dimension
moves away from that point.

This is particularly critical for instruments used across wide
measurement ranges — calipers, CMMs, and height gauges being common
culprits. The organization assumes accuracy everywhere because the
calibration certificate says accurate somewhere.

The Gage
R&R Study: Your Measurement System’s Report Card

The most common MSA tool is the Gage Repeatability and
Reproducibility study. The standard format is straightforward:

  • Select 10 parts that represent the actual process variation
  • Select 3 operators who normally perform the measurement
  • Each operator measures each part 3 times in random order
  • Analyze the results

The analysis produces a decomposition of total observed variation
into its components:

%GRR (Gage Repeatability and Reproducibility) = the
percentage of total variation attributable to the measurement
system.

The acceptance criteria are well-established:

%GRR Decision
Under 10% Acceptable — measurement system is adequate
10% to 30% Marginal — may be acceptable depending on the application, the cost
of improvement, and the risk
Over 30% Unacceptable — the measurement system cannot reliably distinguish
between parts

But here’s what most practitioners miss: the %GRR depends on the part
variation in your study. If you select 10 parts that are nearly
identical, your %GRR will be artificially high because the measurement
system is being asked to distinguish between parts that are barely
different. If you select 10 parts spanning the full specification range,
your %GRR will be lower. The parts you choose for the study influence
the results — which means the study design itself requires judgment, not
just procedural compliance.

There’s also the distinction between %GRR relative to total variation
and %GRR relative to tolerance. If your process is highly capable (tight
distribution well within spec), %GRR relative to total variation might
look poor even though the measurement system is perfectly adequate for
making accept/reject decisions. %GRR relative to tolerance tells you
whether the measurement system can reliably determine if a part is in or
out of spec — which is often the more practical question.

The Crossed
vs. Nested Design: Why It Matters

Most automotive Gage R&R studies use a crossed design: every
operator measures every part. This is appropriate when the measurement
is non-destructive — the part doesn’t change during measurement.

But when measurement is destructive — tensile testing, hardness
testing, chemical analysis — you can’t have the same operator measure
the same part multiple times. Each test destroys the specimen. In these
cases, you need a nested design where “repeatability” is estimated from
multiple specimens from the same homogeneous batch.

Many organizations don’t understand this distinction. They attempt
crossed designs on destructive tests, selecting parts that are “similar
enough” and pretending they’re the same part. This produces GRR numbers
that look like science but are actually wishful thinking.

The correct approach for destructive testing requires careful batch
selection and statistical methods that account for within-batch
variation. It’s more complex, less standardized, and more dependent on
statistical expertise. Which means it’s exactly where most organizations
cut corners.

Attribute
MSA: When Pass/Fail Isn’t as Simple as It Sounds

Not all measurement systems produce continuous data. Visual
inspection, go/no-go gauges, torque verification with pass/fail
indicators, and sensory evaluations produce attribute data: accept or
reject.

Attribute MSA evaluates the effectiveness of these systems through
methods like the Attribute Agreement Analysis or the Kappa study. The
key metrics are:

  • Effectiveness: The percentage of correct decisions
    (matching the known standard)
  • Miss Rate: The percentage of bad parts accepted
    (false accepts)
  • False Alarm Rate: The percentage of good parts
    rejected (false rejects)
  • Kappa Statistic: A measure of agreement between
    operators, corrected for agreement that would occur by chance

Attribute measurement systems are notoriously unreliable. Study after
study has demonstrated that visual inspection effectiveness typically
ranges from 60% to 85% — meaning even trained inspectors miss 15% to 40%
of defects. When the defect is subtle, the lighting is poor, the
inspector is fatigued, or the shift is long, effectiveness drops even
further.

I once facilitated an attribute MSA study at a pharmaceutical
packaging facility. The inspection was visual — operators checking
filled vials for particulate contamination. The study revealed that
three experienced inspectors agreed with each other only 62% of the time
and agreed with the known standard only 71% of the time. For twenty
years, this organization had been making batch release decisions based
on a measurement system that was wrong nearly one-third of the time.

The solution wasn’t to retrain the operators — they were already
well-trained. The solution was to redesign the measurement system:
better lighting, magnification, standardized viewing distance, defined
inspection time per vial, and mandatory breaks to combat fatigue. The
redesigned system achieved 94% effectiveness. But no one had ever
measured the measurement system before, so no one knew it needed
redesigning.

The Hidden Cost of Ignoring
MSA

When organizations skip MSA or treat it as a checkbox exercise, the
consequences cascade through every quality system:

Wasted SPC effort. If your measurement system
contributes 40% of the observed variation, your control chart is
predominantly charting measurement noise. The control limits are
inflated. The process appears out of control when it’s actually stable,
and appears stable when it’s actually drifting. Operators chase phantom
special causes while real process shifts go undetected.

Meaningless capability indices. Your Cpk of 1.67? If
GRR is 30% of total variation, the real process capability is higher
than what you’re reporting — you’re underestimating your process because
the measurement noise is inflating the denominator. Or worse: if the
bias is systematic and uncorrected, your process might be closer to the
specification limit than your data suggests.

Wrong process improvement priorities. When you
launch a Six Sigma project to reduce variation in a process, you first
need to know how much of the observed variation is actually process
variation and how much is measurement error. If 50% of the variation
you’re trying to reduce comes from the measurement system, you’re
spending half your improvement budget on the wrong problem.

Supplier disputes. Your supplier ships you parts
they measured as conforming. You measure the same parts and find them
non-conforming. Who’s right? Without MSA on both sides, the answer is:
neither of you knows. The dispute isn’t about the parts — it’s about the
measurement systems, and neither party has validated theirs.

Unnecessary scrap. Parts near the specification
boundary are the most vulnerable. If your measurement system has high
GRR, a part that is truly at the specification limit might measure
in-spec on one trial and out-of-spec on the next. You’re scrapping good
parts and shipping bad ones, and the measurement system can’t tell you
which is which.

Building a Competent MSA
Practice

Moving from MSA compliance to MSA competence requires several
shifts:

Plan for measurement system validation, not just
calibration.
Calibration schedules are necessary but
insufficient. Every measurement system in your control plan should have
an initial MSA, periodic reassessment, and triggered reassessment when
something changes — new operator, instrument repair, method revision, or
observed measurement anomalies.

Train your people on what MSA results actually mean.
Most quality engineers can run a Gage R&R study. Fewer can interpret
the results in context, select appropriate parts for the study,
distinguish between crossed and nested designs, or translate MSA
findings into actionable improvement plans. This is analytical
competence, not procedural compliance.

Connect MSA to SPC. Your SPC system should reference
the MSA status of every measurement system feeding it. If a gauge’s GRR
exceeds 30%, the data from that gauge should be flagged as unreliable.
Don’t chart noise and call it process monitoring.

Treat the measurement system as a process. Like any
process, it has inputs (parts, operators, instruments, environment,
methods), outputs (measurement results), and sources of variation.
Manage it with the same discipline you apply to your manufacturing
processes. Control plans for measurement systems aren’t exotic — they’re
essential.

Close the loop. When an MSA study reveals a problem,
fix it. This sounds obvious, but I’ve seen organizations file MSA
reports with %GRR above 40% and take no corrective action because “the
customer hasn’t asked about it yet.” The measurement system is broken,
the data is unreliable, and the organization proceeds as if nothing is
wrong because no external auditor has flagged it.

The
Deeper Implication: Epistemic Humility in Quality Management

MSA teaches something that extends beyond measurement: the quality of
your decisions is limited by the quality of your data, and the quality
of your data is limited by the quality of your measurement systems.

Every control chart, every capability study, every hypothesis test,
every statistical analysis you perform assumes that the data accurately
represents reality. MSA is the discipline that tests this assumption.
Without it, you’re not making data-driven decisions — you’re making
measurement-error-driven decisions and calling them data-driven.

This is epistemic humility: the recognition that what you think you
know might be wrong, not because you’re foolish, but because your
instruments of knowing are imperfect. In quality management, the
measurement system is the instrument of knowing. Validating it isn’t
bureaucracy — it’s intellectual honesty.

The organizations that take MSA seriously are the ones that make
decisions with confidence, not because they trust their intuition, but
because they’ve earned the right to trust their data. They know the
limits of their measurement systems. They know where the data is strong
and where it’s weak. They make different decisions in different
measurement contexts because they understand that the same specification
means different things depending on the measurement system used to
evaluate it.

The 12,000 shafts I mentioned at the beginning? The customer accepted
the explanation, the supplier invested in proper measurement systems,
and the problem never recurred. But the lesson should have been learned
before the scrap, not after. The MSA study that would have prevented the
entire incident would have taken two days and cost nothing compared to
the recall.

The data you trust is only as good as the measurement system that
produced it. If you haven’t validated the system, you haven’t validated
the data. And if you haven’t validated the data, every decision built on
it stands on a foundation you’ve never tested.

That’s not quality management. That’s faith.


Peter Stasko is a Quality Architect with 25+ years
of experience transforming organizations across automotive, aerospace,
and pharmaceutical industries. He has led MSA implementation programs
that transformed measurement practices from compliance exercises into
strategic decision-support systems, and he’s never looked at a control
chart the same way since discovering what was really behind the
numbers.

Scroll top