Measurement System Analysis: When Your Gage R&R Study Proves Your Measurement System Is Fine While Your Operators Keep Getting Different Answers — and the Precision You Calculated Became the Accuracy You Mistook for Truth

Blog

Every manufacturing engineer has been there. You run a Gage R&R
study, the numbers come back under 10%, everyone nods approvingly, and
the study gets filed away in a binder that nobody opens again.
Meanwhile, on the production floor, Operator A measures a part and gets
12.47 mm. Operator B measures the same part and gets 12.52 mm. The
tolerance is ±0.05 mm. One says it passes. The other says it fails. And
your perfectly compliant Gage R&R study offers zero help in
resolving the dispute.

This is the Measurement System Analysis paradox: the tool designed to
tell you whether you can trust your measurements has itself become a
ritual that organizations perform to feel confident about measurements
they should not trust. The study passes, the auditor is satisfied, but
the measurement system is still broken. You just spent three days
proving something that isn’t true.

What Measurement
System Analysis Actually Is

Let’s start with the honest version. Measurement System Analysis
(MSA) is the discipline of understanding how much of the variation you
observe in your process data comes from the process itself and how much
comes from the act of measuring it. It is, in its simplest form, an
attempt to answer one question: can you trust the numbers you’re
collecting?

The fundamental equation is brutally straightforward:

Total Observed Variation = Process Variation + Measurement
Variation

If your measurement variation is large relative to your process
variation, you are making decisions based on noise. You are chasing
ghosts. You are rejecting good parts and accepting bad ones, and you
have no idea which is which.

The automotive industry formalized this through the AIAG MSA manual,
which has gone through multiple editions and remains one of the most
referenced — and most misunderstood — documents in quality engineering.
The core tool, the Gage Repeatability and Reproducibility (Gage R&R)
study, attempts to decompose measurement variation into two
components:

  • Repeatability (Equipment Variation, EV): Can the
    same operator, using the same gage, measuring the same part, get the
    same answer repeatedly?
  • Reproducibility (Appraiser Variation, AV): Can
    different operators, using the same gage, measuring the same parts,
    agree with each other?

Together, these give you the Gage R&R number — the percentage of
total variation (or tolerance) that your measurement system consumes.
The widely accepted guidelines say:

  • Under 10%: Acceptable
  • 10-30%: Marginal — acceptable depending on the application
  • Over 30%: Unacceptable — the measurement system needs
    improvement

Simple enough. A child could understand it. And that’s precisely the
problem.

The
First Lie: The Study Conditions vs. Production Reality

When you conduct a Gage R&R study, you create an artificial
environment. You select parts that span the expected range. You select
operators who are trained and available. You conduct the study in a
controlled area with stable temperature and minimal vibration. You use a
gage that was recently calibrated. You give operators time to measure
carefully.

None of these conditions exist on the production floor.

On the production floor, operators measure parts at line speed.
They’re tired. The gage hasn’t been calibrated in weeks. The temperature
swings between shifts. The fixtures are worn. The lighting is different.
The parts are presented in a different orientation. And there’s pressure
— production pressure, the kind that makes you rush the fifth reading
because the line is backing up.

Your Gage R&R study tells you how good your measurement system
can be under ideal conditions. It does not tell you how good it is on a
Tuesday afternoon when everything is going wrong. But organizations
treat the study result as if it describes production reality. It
doesn’t. It describes a laboratory fantasy.

This is not a minor quibble. It’s the difference between a
measurement system that works and one that merely passes a test.

The
Second Lie: Percentage of Tolerance vs. Percentage of Variation

Here’s a subtler trap that catches even experienced quality
engineers. There are two ways to express Gage R&R:

As a percentage of total variation: Gage R&R ÷
Total Variation × 100 As a percentage of tolerance:
Gage R&R ÷ Tolerance Range × 100

These can give dramatically different results. If your process is
highly capable (Cpk > 2.0), your total process variation is tiny
relative to the tolerance. A measurement system that consumes 30% of
your process variation might consume only 5% of your tolerance. The
study passes by the tolerance method but fails by the variation
method.

Which one should you use? It depends on what you’re trying to do. If
you’re using the measurement system for process control (SPC charts,
trend analysis), you care about variation — because measurement noise
masks real process shifts. If you’re using it for inspection (pass/fail
decisions against a specification), you care about tolerance.

Most organizations pick the method that gives them the passing
result. This is not analysis. This is shopping.

The Third Lie:
Linearity and Bias Get Ignored

Gage R&R studies get all the attention because they produce a
single number that fits neatly into a report. But MSA includes other
critical characteristics that routinely get skipped:

Bias: Does your measurement system, on average, read
higher or lower than the true value? If your caliper consistently reads
0.003 mm high, every part you measure is being judged against a shifted
standard. Your Gage R&R could be perfect and you’d still be making
wrong decisions.

Linearity: Does the bias change across the
measurement range? A micrometer might be perfectly accurate at 10 mm but
read 0.008 mm high at 25 mm. If you only study bias at one reference
value, you’ll never catch this.

Stability: Does the measurement system drift over
time? Your Gage R&R study is a snapshot. If the gage drifts by 0.01
mm per month, the study you ran in January says nothing about what’s
happening in June. But nobody runs Gage R&R studies monthly. They
run them once, file them, and move on.

The AIAG manual describes all of these. Most organizations do the
Gage R&R and ignore the rest. It’s like getting a full medical
checkup and only looking at your weight.

The Fourth
Lie: Attribute Gage Studies Are a Joke

Not everything gets measured with calipers and CMMs. A huge portion
of manufacturing inspection is visual — pass/fail judgments made by
human operators looking at parts. Scratch? No scratch? Burr? Acceptable
burr? Color match? Good enough?

Attribute Gage R&R studies attempt to quantify the reliability of
these visual inspections. The standard approach is the Attribute
Agreement Analysis: multiple operators evaluate the same set of parts
multiple times, and you calculate agreement percentages both within each
operator (repeatability) and between operators (reproducibility).

Here’s what actually happens. The engineer selects 20-30 parts, some
known good, some known bad, some borderline. The operators evaluate
them. The borderline parts drive disagreement — because “borderline”
literally means “reasonable people can disagree.” The study reveals that
your visual inspection system has 65-75% agreement, which is below the
threshold. The corrective action is… what, exactly?

You can’t calibrate a human eye the way you calibrate a micrometer.
The fix for visual inspection disagreement is better standards —
physical boundary samples, better lighting, magnification, clearer
criteria — not more Gage R&R studies. But the study is what gets
done because the study is what the auditor asks for.

The
Fifth Lie: The Parts You Select Determine the Result

A Gage R&R study requires you to select sample parts that
represent the expected process variation. This selection is subjective
and has enormous influence on the outcome.

If you select parts that cluster tightly around the nominal, your
total observed variation will be small, and the measurement variation
will look large by comparison. Your Gage R&R percentage will be
high, and the study will “fail.”

If you select parts that span a wide range — perhaps deliberately
including some near the specification limits — your total observed
variation will be large, and the measurement variation will look small.
Your Gage R&R percentage will be low, and the study will “pass.”

Same measurement system. Same operators. Same gage. Different parts.
Different result. The study is not objective. It is sensitive to
sampling, and the person selecting the parts usually knows what result
is needed.

What a Genuine MSA
Program Looks Like

Organizations that actually use MSA — rather than perform it — share
several characteristics:

They study measurement systems before trusting them, not
after.
Before putting a new gage on the production floor, they
understand its capabilities and limitations. They know where it’s strong
and where it’s weak. They don’t wait for an audit to find out.

They match the study to the decision. If the
measurement is used for SPC, they evaluate against process variation. If
it’s used for inspection, they evaluate against tolerance. They don’t
pick the favorable metric.

They study stability over time. They use control
charts on reference standards. They track bias week by week. They know
when a gage starts drifting before it affects production decisions, not
after.

They include all the MSA elements. Bias, linearity,
stability, repeatability, reproducibility — not just the one that
produces a pass percentage.

They are honest about attribute systems. They
recognize that visual inspection is fundamentally different from
dimensional measurement. They invest in physical standards, boundary
samples, and training rather than running another study that tells them
what they already know: humans disagree about borderline cases.

They act on the results. When a study reveals that a
measurement system is inadequate, they fix it — better gages, better
fixtures, better methods, better training. They don’t rerun the study
with different parts until it passes.

The Deeper
Problem: Measurement as Afterthought

The root cause of MSA theater is that most organizations treat
measurement as a necessary overhead rather than a core capability. They
invest in process equipment, in tooling, in automation. They specify
gages based on what’s available and affordable, not on what the process
actually requires. And then they run an MSA study to confirm that the
cheapest acceptable option was indeed acceptable.

This is backwards. The measurement system is the lens through which
you see your process. If the lens is distorted, everything you see is
distorted. You cannot improve what you cannot measure accurately, and
you cannot measure accurately if you treat your measurement systems as
an afterthought to be validated once and forgotten.

The organizations with the best quality are not the ones with the
most Gage R&R studies filed in binders. They are the ones whose
operators trust their gages, whose engineers trust their data, and whose
managers make decisions based on measurements that genuinely reflect
reality. That trust is earned through honest, ongoing, rigorous
attention to measurement capability — not through passing studies that
prove what everyone already knows isn’t true.

The Real Question

Next time someone hands you a Gage R&R study with a passing
result, don’t ask “what’s the percentage?” Ask these questions
instead:

  • When was this study done, and what has changed since then?
  • Were the study conditions representative of actual production
    conditions?
  • Did you evaluate bias, linearity, and stability, or just
    R&R?
  • Which method did you use — % tolerance or % variation — and
    why?
  • Who selected the parts, and what criteria did they use?
  • What would happen if we ran this study again tomorrow with different
    parts?

If the answers make you uncomfortable, good. That discomfort is more
valuable than any passing percentage. It means you’re starting to see
the measurement system for what it actually is, rather than what the
study says it should be.

The measure of a good measurement system is not whether it passes a
study. It’s whether the people using it trust it enough to make real
decisions with the numbers it gives them. Everything else is
paperwork.


Peter Stasko is a Quality Architect with over 25
years of experience in manufacturing quality management, process
improvement, and quality system design across automotive, aerospace, and
industrial manufacturing sectors. He writes about the real-world
failures of quality tools and systems — not the textbook versions, but
the ones that happen on production floors when theory meets reality.

Scroll top