Quality
and Goodhart’s Law: When Your Organization’s Metrics Become Targets —
and the Numbers You Optimized Stop Meaning What You Thought They
Meant
There’s a moment in every quality manager’s career when they realize
something has gone terribly wrong with their measurement system. Not the
instruments — those are calibrated. Not the operators — they’re trained.
The problem is that the numbers themselves have become the enemy of what
they were supposed to represent.
The defect rate dropped to 0.3% last quarter. Everyone celebrated.
The dashboard turned green. The VP sent a congratulatory email. And
somewhere on the shop floor, a team leader quietly changed how defects
get classified so that items that used to be failures now appear as
“rework within specification.”
If you’ve worked in quality for more than five years, you’ve seen
this movie. You know how it ends. And you know it has a name.
It’s called Goodhart’s Law, and it’s eating your quality system from
the inside out.
What Goodhart’s Law Actually
Says
The original formulation comes from the British economist Charles
Goodhart, who observed in 1975 that “any observed statistical regularity
will tend to collapse once pressure is placed upon it for control
purposes.” The more popular version, attributed to Marilyn Strathern, is
simpler and more devastating: When a measure becomes a target,
it ceases to be a good measure.
In economics, this explains why inflation targets distort pricing
behavior. In education, it explains why teaching to the test replaces
actual learning. In quality management, it explains why every KPI you’ve
ever introduced eventually stopped working — and nobody could figure out
why.
Goodhart’s Law isn’t a theory. It’s a prediction. And it comes true
every single time.
The Anatomy of Metric Decay
Here’s how it happens in a manufacturing environment, step by
step.
Phase 1: The Metric Works. You introduce first-pass
yield as a key performance indicator. For the first three months, it’s
genuinely useful. It tells you something real about your process. People
pay attention to it. Decisions improve. The number goes up because
actual quality improves.
Phase 2: The Stakes Rise. Someone in leadership
decides that first-pass yield should be tied to performance reviews. Or
bonuses. Or departmental rankings. Suddenly the metric isn’t just
information — it’s consequences.
Phase 3: The Gaming Begins. The team doesn’t
consciously decide to cheat. But the incentives are clear, and human
beings are remarkably creative when their livelihood depends on a
number. Rework gets reclassified. Boundary conditions get reinterpreted.
The operator who used to flag a borderline defect now lets it slide
because the cost of being wrong about one part is lower than the cost of
being wrong about the metric.
Phase 4: The Metric Dies. First-pass yield is now
98.7%. It’s been 98.7% for six months. Your customer complaints haven’t
changed. Your warranty claims haven’t changed. Your scrap costs haven’t
changed. But your dashboard looks beautiful. The metric has stopped
measuring reality and started measuring the organization’s ability to
produce the metric.
This isn’t speculation. This is what happens in every factory, in
every industry, on every continent. The timeline varies — sometimes it
takes three months, sometimes three years — but the destination is
always the same.
The Five
Patterns of Metric Corruption in Quality
Over two and a half decades of auditing and consulting, I’ve seen
Goodhart’s Law manifest in five recurring patterns. Recognize any of
these?
1. Reclassification
The most common form of metric gaming. When the cost of reporting a
defect exceeds the cost of hiding it, reclassification becomes the
rational response.
A Tier 1 automotive supplier I worked with had an impressive 99.4%
delivery performance. When I dug into the data, I discovered that late
deliveries were being reclassified as “rescheduled by customer” if the
customer agreed — under pressure — to accept the shipment a day late.
The metric measured the ability to get customers to accept late
delivery, not the ability to deliver on time.
The fix isn’t more audits. The fix is asking: Does the person
who reports this number have a stake in what it says? If yes,
you have a structural conflict of interest that no amount of training
will resolve.
2. Threshold Fixation
When you set a target — say, 2% scrap rate — people will optimize to
hit exactly that number. Not to minimize scrap. To hit 2%.
I visited a plant that had maintained a scrap rate between 1.8% and
2.0% for eighteen consecutive months. The consistency was statistically
suspicious. It turned out that when scrap ran low early in the month,
the team would ease off on process controls to make sure they didn’t
“waste” their buffer. They were managing to the target, not to the
process.
This is the most insidious form of Goodhart’s Law because it doesn’t
involve dishonesty. The team was behaving rationally given the incentive
structure. The problem wasn’t the people. It was the system.
3. Measure Migration
When a metric becomes a target, organizations unconsciously shift
their attention away from unmeasured dimensions of quality.
A medical device company I consulted for tracked final inspection
pass rate religiously — 99.1% and climbing. What they didn’t track was
the number of non-conformances caught at incoming inspection, which had
tripled over the same period. The quality hadn’t improved. It had
migrated. The defects were being caught earlier in the process, which is
good, but the cost of catching them had increased dramatically because
the organization had stopped investing in process control and started
investing in inspection.
The measure that improved was downstream. The cost that exploded was
upstream. And nobody noticed because nobody was looking at the
relationship between the two.
4. Data Smoothing
Aerospace manufacturer. Monthly quality review. The line chart of
customer complaints showed a beautiful, smooth decline over twelve
months. Too beautiful. Too smooth.
When I asked to see the raw weekly data, the pattern was volatile —
high weeks and low weeks with no clear trend. The monthly report was
averaging out the variation in a way that concealed the real story: a
systematic problem with Monday starts after weekend changeovers.
The team wasn’t deliberately smoothing the data. The reporting
cadence was doing it for them. When you aggregate data to the level
where it looks good on a slide, you lose the resolution needed to
actually manage the process.
5. Target Inflation Without
Context
“We improved our OEE from 72% to 86%.” Did you? Or did you change
what counts as planned downtime?
“We reduced customer complaints by 40%.” Did you? Or did you make it
harder for customers to submit complaints?
“We achieved ISO 9001 certification.” Did your quality improve? Or
did your documentation?
Every one of these is Goodhart’s Law in action. The number improved.
The reality behind it didn’t.
Why This Keeps Happening
Goodhart’s Law isn’t a failure of integrity. It’s a failure of system
design.
When you attach consequences to a metric, you change the relationship
between the observer and the observed. This is the measurement
equivalent of the Heisenberg Uncertainty Principle: the act of
measurement changes the behavior being measured.
But there’s a deeper reason. Most organizations operate under what I
call the “Metric Fundamentalism” fallacy — the belief that if something
can be measured, the measurement is the thing. That the map is the
territory. That the dashboard is the factory.
It isn’t. The metric is a shadow cast by reality. When you start
chasing the shadow, you lose sight of what’s casting it.
And there’s a psychological dimension that most quality systems
completely ignore. When people are evaluated on numbers, their cognitive
resources shift from “how do I improve the process?” to “how do I
improve the number?” These sound similar but they are fundamentally
different questions with fundamentally different answers.
The Goodhart-Resistant
Quality System
You cannot eliminate Goodhart’s Law. It’s as fundamental as entropy.
But you can design systems that are resistant to it. Here’s how.
Use Metrics as
Flashlights, Not Scoreboards
The purpose of a metric is to illuminate, not to judge. When you use
a flashlight to find something in a dark room, you don’t grade the
flashlight on what it found. You use what it revealed to make
decisions.
In practice, this means decoupling metrics from individual
performance evaluation. Use them for process monitoring, for trend
analysis, for resource allocation. But the moment you tie someone’s
bonus to a number, you’ve planted the seed of Goodhart’s Law.
This doesn’t mean no accountability. It means accountability based on
triangulation — multiple data sources, qualitative and
quantitative, that converge on a picture of reality. One metric can be
gamed. Three independent metrics that all tell the same story are much
harder to corrupt.
Measure Processes, Not
Outcomes
Outcomes are what get gamed. Processes are what actually produce
quality.
Instead of tracking defect rate (outcome), track whether the process
control plan was followed (process). Instead of tracking customer
complaints (outcome), track the response time and effectiveness of
corrective actions (process).
Process metrics are harder to game because they describe what you
did, not what happened. And what you did is within
your control in a way that outcomes — influenced by variation, external
factors, and randomness — never fully are.
Build Redundancy
Into Your Measurement System
Never rely on a single metric to tell you anything important. If you
want to know about quality, look at defect rates AND customer complaints
AND warranty claims AND internal audit findings AND employee turnover in
quality-critical roles.
If all five are improving, quality is probably improving. If one is
improving and the others aren’t, you’re seeing Goodhart’s Law in
action.
I call this the “conviction standard” — borrowed from journalism. One
source is a tip. Two sources is a story. Three independent sources is a
conviction. Don’t make decisions based on tips.
Rotate and Refresh Your
Metrics
Every metric has a shelf life. The moment people figure out how to
optimize it without improving the underlying reality, it’s dead.
The solution isn’t to abandon measurement. It’s to rotate your
metrics periodically — not randomly, but strategically. Use a core set
of health indicators that remain stable over time, supplemented by
rotating investigative metrics that probe different dimensions of
quality.
Think of it like a medical checkup. You always check blood pressure
and heart rate (stable core metrics). But your doctor also rotates in
different tests based on age, symptoms, and risk factors (investigative
metrics). You don’t stop going to the doctor just because you know your
resting heart rate. You add new measurements as the situation
evolves.
Separate Measurement From
Management
The person who measures should not be the person who is measured.
This isn’t about trust — it’s about architecture.
In the best quality systems I’ve seen, the data collection function
is independent from the operational function. Not adversarial —
independent. The production team runs the process. The quality team
monitors the metrics. Neither reports to the other. The data flows
freely in both directions, but nobody has the ability to simultaneously
produce and evaluate the same number.
This is why external audits work — not because external auditors are
smarter, but because they have no stake in the numbers they’re
examining. Build some of that independence into your internal
systems.
A Personal Observation
After twenty-five years in this field, I’ve come to believe that the
single most dangerous sentence in quality management is: “What
gets measured gets managed.”
It’s attributed to Peter Drucker, and it’s not wrong — but it’s
incomplete. The full truth is closer to: “What gets measured gets
managed — and whatever you attach consequences to gets gamed.”
The organizations with the best quality cultures I’ve ever seen don’t
have the best metrics. They have the best conversations about metrics.
They treat numbers as the beginning of a discussion, not the end of one.
They use dashboards as conversation starters, not verdicts.
In a plant in southern Germany, I watched a quality manager present a
month of data to his team. The OEE had dropped. The scrap rate had
risen. Two metrics moving in the wrong direction. He didn’t ask who was
responsible. He asked: “What is the data trying to tell us?”
That question — not the numbers themselves — is the antidote to
Goodhart’s Law.
The Hard Truth
Your metrics are lying to you. Not because they’re wrong, but because
you asked them a question they can’t honestly answer. You asked them,
“Are we doing well?” And they told you what you wanted to hear — because
that’s what happens when you tie your self-esteem, your budget, and your
bonus to the answer.
The path forward isn’t better metrics. It’s better relationships with
metrics. Treat them as imperfect windows into a complex reality, not as
mirrors that reflect your excellence back at you.
The best quality system I ever built didn’t have the most
sophisticated dashboard. It had the most honest culture. And that
culture was built on a single principle that every member of the team
could recite from memory:
The number is not the thing. The number points toward the
thing. Never confuse the two.
Peter Stasko is a Quality Architect with 25+ years of experience
transforming organizations across automotive, aerospace, and
pharmaceutical industries. He has spent decades helping companies see
past their dashboards and into the real processes that determine whether
quality is genuinely improving or just looking better on paper.