Quality
MSA: When Your Organization’s Measurements Are More Noise Than Signal —
and the Data You’re Basing Every Decision On Turns Out to Be Mostly
Measurement Error

The Multiplier You Never
Measured

Here is a thought experiment that should keep you up at night.

Your CMM reports a shaft diameter of 24.987 mm. The specification is
25.000 ± 0.020 mm. The part is in tolerance. You ship it. Your customer
measures it at 25.031 mm. It’s out of tolerance. They reject the lot.
You argue. They argue back. A quality engineer spends three days writing
an 8D report. A corrective action gets assigned. A containment is
issued. Two weeks later, you discover the truth: neither measurement was
right. Your CMM was reading 0.025 mm low. Their gauge was reading 0.015
mm high. The actual dimension was 25.012 mm — comfortably in spec. The
entire episode cost $47,000 in engineering time, expedited shipping,
customer confidence, and a corrective action that fixed nothing because
there was nothing to fix.

What broke? Not the part. Not the process. The measurement
system.

This is the blind spot that lives inside every quality organization
on earth. You spend millions controlling your manufacturing process. You
spend hundreds of hours analyzing defect data. You build SPC charts, run
capability studies, calculate Cpk values to two decimal places. And you
assume — without verification — that the numbers feeding all of those
systems are actually correct.

Measurement System Analysis is the discipline that asks the
uncomfortable question: Before you trust the data, how much of it is
the part, and how much of it is the measurement itself?

The answer, in more organizations than anyone would like to admit, is
horrifying.

What MSA Actually Measures

Most people think MSA is about calibrating gauges. It is not.
Calibration tells you whether your instrument reads correctly against a
known standard. MSA tells you whether your entire measurement system —
the gauge, the operator, the method, the environment, the part
interaction — produces data you can trust.

Think of it this way: calibration is checking whether your bathroom
scale reads zero when nothing is on it. MSA is checking whether ten
people weighing the same person ten times get the same answer. Those are
fundamentally different questions, and confusing them is one of the most
expensive mistakes a quality organization can make.

MSA breaks down the total variation you observe in your measurements
into its components:

Part-to-part variation — the actual differences
between the things you’re measuring. This is the signal. This is what
you want to see.

Repeatability — can the same operator, using the
same gauge, measuring the same part, get the same answer twice? This is
the gauge’s contribution to noise. A micrometer with worn anvils will
give you different readings every time you click the ratchet. The part
hasn’t changed. The measurement has.

Reproducibility — can different operators, using the
same gauge, measuring the same parts, agree on the answers? This is the
human contribution to noise. One operator pushes the CMM probe harder
than another. One reads the dial at a different angle. One interprets
the fixture contact point differently. The part hasn’t changed. The
measurement has.

Interaction effects — does Operator A get consistent
results on Part 1 but inconsistent results on Part 3? This is where it
gets subtle. Some parts are harder to measure than others, and some
operators struggle with certain feature types. This operator-by-part
interaction is the ghost in the machine that most organizations never
detect because they never run a full crossed Gage R&R study.

The fundamental equation is brutal in its simplicity:

Total observed variation = Actual part variation +
Measurement system variation

When your measurement system variation consumes 50% or more of your
observed variation, you are not measuring your process. You are
measuring your measurement system. Every control chart you plot is
mostly noise. Every capability index you calculate is fiction. Every
decision you make based on that data is a coin flip wearing a lab
coat.

The Gage
R&R Study: Your Measurement System’s Stress Test

The core tool of MSA is the Gage Repeatability and Reproducibility
study. The design is elegant in its simplicity:

Select 10 parts that span the full range of your process variation.
Select 3 operators who normally perform this measurement. Have each
operator measure each part 3 times, in random order, blind to previous
readings. That gives you 90 data points. From those 90 numbers, you can
decompose the entire measurement system.

The AIAG (Automotive Industry Action Group) standards provide clear
acceptance criteria:

Under 10% GRR — Your measurement system is
acceptable. The data it produces is trustworthy. You can make process
decisions with confidence.
10% to 30% GRR — Your measurement system is
marginal. It might be acceptable depending on the application, the
criticality of the characteristic, and the cost of improving it. But you
should be uneasy.
Over 30% GRR — Your measurement system is
unacceptable. The data it produces cannot be trusted for process
analysis or decision-making. You are essentially guessing with extra
steps.

Here is the part that surprises people: in my experience auditing
measurement systems across automotive, aerospace, and pharmaceutical
companies, more than half of all Gage R&R studies initially come
back above 30%. The measurement systems that organizations have been
trusting for years — the ones feeding their SPC charts, their capability
studies, their acceptance decisions — are, in many cases, producing data
that is more noise than signal.

This is not a failure of intent. It is a failure of awareness. Nobody
set out to build a bad measurement system. They just never tested
it.

The Hidden Costs of Bad
Measurement

The costs of an unanalyzed measurement system cascade through your
organization in ways that are difficult to see because they look like
process problems, not measurement problems.

False rejects. Your process is running fine, but
your measurement system says it’s not. You scrap good parts. You rework
conforming product. You shut down lines that don’t need to be shut down.
I once worked with a medical device manufacturer that was scrapping 12%
of its catheter production based on diameter measurements. A Gage
R&R study revealed that their laser micrometer had 45% GRR. They
were throwing away nearly half a million dollars a year in perfectly
good product. The fix — a better fixture design and a measurement
procedure change — cost $8,000.

False accepts. This is the dangerous one. Your
process is producing nonconforming parts, but your measurement system
doesn’t detect them. You ship bad product. Your customer finds it. Or
worse, your customer’s customer finds it. In the aerospace industry, a
false accept on a critical dimension isn’t a quality issue — it’s a
safety issue. The NADCAP accreditation requirements exist precisely
because the consequences of trusting bad measurement data can be
catastrophic.

SPC paralysis. Your control charts show points
jumping above and below the control limits. Your process engineers spend
weeks investigating assignable causes that don’t exist. The real cause
is measurement noise. But because you never quantified the measurement
system contribution, you can’t separate the signal from the noise. Your
continuous improvement resources are consumed by ghost hunts.

Meaningless capability indices. You report a Cpk of
1.67 to your customer. They’re pleased. But if your measurement system
contributes 40% of the observed variation, your actual process
capability is substantially higher — or lower — than the number
suggests. You don’t know which. The Cpk you reported is a number, not a
fact. In the automotive supply chain, where Cpk requirements of 1.67 or
2.0 are contractual obligations, reporting capability indices from
unvalidated measurement systems is not just inaccurate — it’s
potentially a contractual violation.

Supplier disputes. Your supplier says the parts are
good. Your incoming inspection says they’re not. You send the parts
back. The supplier re-measures and gets a different answer. The dispute
escalates. A quality engineer from each company drives to a neutral
third-party lab. Three days and $15,000 later, they discover that both
measurement systems are reading differently from each other — and
neither one has been analyzed. The parts were fine all along.

The Five Categories of
Measurement Error

MSA doesn’t just give you a pass/fail score. It tells you
where your measurement system is breaking down. The AIAG MSA
manual defines five categories — the “Big Five” — that map the terrain
of measurement failure:

Bias. Your measurement system consistently reads
high or low. A torque wrench that reads 2 Nm high across its entire
range. A CMM with a probe that’s slightly out of position. Bias is the
easiest error to detect — measure a known reference standard and see if
you get the known value. It’s also the easiest to correct — apply an
offset. And yet, many organizations check bias only during annual
calibration, if at all, and never under actual measurement conditions
with actual operators.

Linearity. Your measurement system is accurate at
one end of its range and inaccurate at the other. A micrometer that
reads perfectly at 25 mm but drifts 0.010 mm at 50 mm. This is
particularly dangerous because calibration is often performed at a
single point. If that point is the one where the gauge happens to be
accurate, the linearity error goes undetected. I’ve seen coordinate
measuring machines that were certified as calibrated — at the
calibration point — while exhibiting 0.030 mm of non-linearity across
their working volume.

Stability. Your measurement system drifts over time.
The first shift gets different readings from the third shift not because
the process changed but because the gauge changed. Thermal expansion in
the afternoon. A probe tip that wears gradually. A gauge that was
dropped on Monday and nobody reported it. Stability is the error that
hides in the gap between calibrations. If you only check your
measurement system during annual calibration events, you have no idea
what it was doing in the eleven months in between.

Repeatability. The same operator, the same gauge,
the same part, different answers. Worn contact surfaces. Loose fixtures.
Environmental vibration. Digital resolution that’s inadequate for the
tolerance being measured. Repeatability problems almost always point to
hardware issues — the gauge itself needs maintenance, replacement, or a
fundamentally different measurement approach.

Reproducibility. Different operators, different
answers. This is almost always a training, procedure, or fixture
problem. One operator holds the part in the fixture with more force than
another. One operator reads the display from a different angle. One
operator takes the measurement at a slightly different location on the
feature. The fix is usually not the gauge — it’s the method. Better
fixtures, clearer work instructions, mistake-proofed measurement setups,
and standardized techniques.

Understanding which category your measurement error falls into tells
you exactly where to invest your improvement effort. Fixing a
repeatability problem by retraining operators won’t work — the gauge is
the problem. Fixing a reproducibility problem by buying a more expensive
gauge won’t work — the method is the problem. MSA gives you the
diagnosis. Everything else is wasted effort.

The Attribute Measurement
Trap

So far, we’ve been talking about variable measurements — things you
can express as numbers. But a vast amount of quality measurement is
attribute-based: pass or fail, good or bad, visual defect present or
absent. And the MSA story for attribute data is, if anything, even more
alarming.

Attribute agreement analysis works differently from Gage R&R. You
take a set of known reference parts — some good, some bad, some
borderline — and you have multiple appraisers classify them multiple
times. Then you measure four things:

Within-appraiser agreement. Does the same inspector
classify the same part the same way on repeat trials? You’d be stunned
how often they don’t. An inspector passes a part on Monday and fails the
same part on Wednesday. The part hasn’t changed. The lighting is
different. The inspector’s mood is different. They’re more tired. The
borderline cases flip depending on conditions that have nothing to do
with the part.

Between-appraiser agreement. Do different inspectors
agree with each other? In one study I conducted at a consumer
electronics plant, three visual inspectors evaluating surface finish
defects agreed with each other only 64% of the time. That means on more
than a third of the parts, at least one inspector would have made a
different call than the others. The “standard” they were using was a set
of limit samples that two of the three inspectors had never been
formally trained on.

Agreement with standard. Do the inspectors agree
with the known reference? This is the acid test. If the correct answer
is “fail” and your inspector calls it “pass” 15% of the time, you have a
15% false accept rate baked into your quality system. For a
pharmaceutical company inspecting vials for particulate contamination,
that 15% is a patient safety issue.

Effectiveness, miss rate, and false alarm rate.
These three metrics give you the full picture. Effectiveness: what
percentage of defects do your inspectors actually catch? Miss rate: what
percentage of defects slip through? False alarm rate: what percentage of
good parts do your inspectors reject? In my experience, miss rates of
10-20% and false alarm rates of 5-15% are common in visual inspection
operations that have never been formally studied.

The automotive industry addresses this with the requirement for
attribute Gage R&R studies during PPAP submissions. But the
requirement is often treated as a checkbox — minimum sample sizes,
perfect reference samples that make the study trivially easy, and
results that are never revisited after the initial submission. The real
value of attribute MSA comes not from the initial study but from the
ongoing discipline of monitoring and improving your inspection
system.

The Practical Path Forward

If you’re reading this and realizing that your organization has
measurement systems that have never been analyzed, or were analyzed once
years ago and never revisited, here is a practical approach:

Start with your critical measurements. Not every
gauge needs a full Gage R&R study. Focus on the measurements that
drive acceptance decisions, process adjustments, and customer-facing
data. If a measurement can trigger a reject, a machine stop, or a
customer report, it needs to be analyzed.

Run the study under actual conditions. The biggest
mistake organizations make is running Gage R&R studies in a lab
environment with selected parts and their best operators. This produces
a best-case result that doesn’t reflect reality. Use your normal
operators, your normal parts, your normal environment. You want to know
how the measurement system performs on a Tuesday afternoon on the shop
floor, not in a climate-controlled metrology lab.

Read the results honestly. A 45% GRR doesn’t mean
your study was wrong. It means your measurement system is wrong. Don’t
adjust the study to make the measurement look good. Fix the
measurement.

Prioritize your fixes. If reproducibility is the
problem, standardize your method. If repeatability is the problem, fix
or replace your gauge. If both are problems, you probably need a
fundamentally different measurement approach — and that’s okay. Knowing
is better than not knowing.

Re-study periodically. Measurement systems degrade.
Gauges wear. Operators change. Procedures drift. An MSA study is not a
one-time event. It’s a periodic health check. Annual re-studies for
critical measurement systems should be a non-negotiable part of your
quality calendar.

Integrate MSA into your APQP process. Don’t wait
until production to validate your measurement systems. During the
planning phase, when you’re selecting gauges and writing measurement
plans, run feasibility studies. Discover that your proposed measurement
approach has 50% GRR during planning — not during your first production
run when the customer is waiting for parts.

The Deeper Truth

Measurement System Analysis is not really about gauges. It is about
epistemology — how you know what you think you know. Every quality
system on earth is built on the assumption that its measurements are
accurate and precise. MSA is the discipline that tests that assumption.
When the assumption holds, your quality system is grounded in reality.
When it doesn’t — and it often doesn’t — your entire quality management
system is a theater production with no connection to the physical
world.

The organizations that take MSA seriously are not the ones with the
most expensive gauges. They are the ones that understand a simple truth:
a measurement you haven’t validated isn’t a measurement. It’s an
opinion with a number attached.

And in quality, opinions — even numerical ones — are not a basis for
decisions that affect product safety, customer satisfaction, and
organizational survival.

Validate your measurements. The cost of the study is trivial compared
to the cost of the decisions you’re making on unvalidated data.

Peter Stasko is a Quality Architect with 25+ years of experience
transforming organizations across automotive, aerospace, and
pharmaceutical industries. He has led MSA programs that uncovered
measurement system errors responsible for millions in annual false
rejects, and he has never once regretted the discomfort of running a
Gage R&R study that told a client something they didn’t want to hear
— because the alternative was making critical decisions on data they
couldn’t trust.

Quality MSA: When Your Organization’s Measurements Are More Noise Than Signal — and the Data You’re Basing Every Decision On Turns Out to Be Mostly Measurement Error

The Multiplier You Never Measured

What MSA Actually Measures

The Gage R&R Study: Your Measurement System’s Stress Test

The Hidden Costs of Bad Measurement

The Five Categories of Measurement Error

The Attribute Measurement Trap