The Numbers
Everyone Wants and Nobody Earns
Process capability analysis is supposed to be the moment of truth in
quality engineering. You’ve been collecting data. You’ve been plotting
control charts. Now you run the numbers and produce two indices — Cp and
Cpk — that tell you, with elegant mathematical precision, whether your
process can actually meet specifications. It’s the quality
professional’s definitive answer to the most important question in
manufacturing: Can we make this part consistently within
tolerance?
The answer, in most organizations, is no. Not because the processes
are incapable — though many are — but because the numbers say yes when
the reality says no. Walk through any factory floor and ask the quality
engineer for the capability study on a critical characteristic. You’ll
get a spreadsheet. The spreadsheet will show a Cpk of 1.67, sometimes
2.0, occasionally even higher. These are beautiful numbers. They are
also, more often than not, complete fiction.
The fiction doesn’t start with dishonesty. It starts with something
worse: a systematic series of methodological errors that transform a
rigorous statistical tool into a performance art. By the time the number
reaches the customer’s audit report, it has been filtered through so
many layers of selective sampling, data manipulation, and mathematical
misunderstanding that it bears no relationship to what’s actually
happening on the production line.
What Process Capability
Actually Means
Let’s start with the honest version. Cp measures what your process
could do if it were perfectly centered between the
specification limits. It’s the ratio of the specification width to the
process width — where process width is defined as six standard
deviations (the “process spread”). A Cp of 1.0 means your process spread
exactly fills the specification window. A Cp of 1.33 means you have some
breathing room. A Cp of 2.0 means your process spread takes up only half
the specification width — very capable, very comfortable.
Cpk adds one crucial element: it accounts for centering. You can have
a magnificent Cp — your process spread is tight and well within spec —
but if the process is shifted off-center, your Cpk will be dramatically
lower. Cpk is the lesser of two values: one measuring capability against
the upper spec limit, the other against the lower. It tells you not just
how narrow your distribution is, but how well it fits inside the window
where it needs to be.
This is the elegant simplicity of process capability. Two numbers.
One tells you about potential. The other tells you about reality.
Together, they should give you a clear, honest picture of whether your
process can deliver.
Where the Fiction Begins
Here’s what actually happens in practice.
Selective
Sampling: The Data That Never Makes the Chart
The most common form of capability fraud is also the most invisible:
selective sampling. The operator collects thirty consecutive parts at
the start of the shift, when the machine is fresh, the tooling is new,
and the material lot is from the supplier’s best batch. These thirty
parts represent the process at its absolute best — a snapshot of
perfection that has nothing to do with what the process does over an
eight-hour run.
The engineer runs the numbers on those thirty golden parts and
produces a Cpk of 2.1. The customer is delighted. The audit passes. And
the reality — that the process drifts, the tool wears, the material
changes, and by hour six the Cpk has dropped to 0.8 — never appears in
any document.
This isn’t necessarily malicious. Often the engineer doesn’t know
better. They were told to “do a capability study” and interpreted that
as “collect some data and run the formula.” The concept of
representative sampling — capturing the full range of variation sources
that the process experiences over time — was never part of the
training.
Normality
Assumption: The Bell Curve That Doesn’t Exist
The entire Cp/Cpk framework rests on one fundamental assumption: your
data follows a normal distribution. This assumption is almost never
validated, and even when it is, the validation is frequently done
wrong.
Here’s what happens. The engineer collects data, dumps it into Excel,
and runs the capability formula. Maybe — maybe — they run a normality
test first. If the p-value comes back above 0.05, they proceed. If it
comes back below, they… proceed anyway. Or they delete a few
“outliers” until the test passes. Or they switch to a different test
that gives them the number they want.
But real manufacturing data is rarely normal. It’s skewed because of
tool wear. It’s bounded because you can’t measure below zero. It’s
bimodal because two different machines or two different operators are
feeding the same study. It’s truncated because parts that are visibly
out of spec are pulled from the line before they’re measured. Each of
these realities distorts the normality assumption, and each distortion
inflates or deflates the capability index in ways the engineer doesn’t
understand.
When your data isn’t normal and you pretend it is, your Cpk isn’t
just wrong — it’s directionally unpredictable. You might be reporting
1.67 when the true capability is 1.1. You might be reporting 1.33 when
the true capability is 0.9. You simply don’t know, and neither does your
customer.
Subgrouping
Strategy: The Variation You Hid
The way you subgroup your data determines what variation you see and
what variation you hide. This is the most subtle and most dangerous
source of capability fraud, because it’s invisible even to quality
professionals who should know better.
Short-term capability studies — the ones most commonly reported to
customers — use small subgroups collected over a brief period. They
capture only within-subgroup variation: the natural fluctuation from
part to part in a narrow window of time. They ignore between-subgroup
variation: the shifts that happen across material lots, tool changes,
operator changes, environmental conditions, and the thousand other
sources of long-term variation that production actually experiences.
The result is predictable. Your short-term Cpk is 2.0. Your long-term
Cpk — the one that reflects what your customer actually receives over
six months of production — is 1.0 or lower. The gap between short-term
and long-term performance is called the “1.5 sigma shift” in Six Sigma
literature, and while the exact magnitude is debatable, the existence of
the gap is not. It’s real, it’s significant, and almost no one reports
the long-term number.
The Specification
Trap: Tighter Isn’t Better
Some organizations inflate their capability numbers by playing with
specifications rather than data. Here’s how it works: the engineering
drawing specifies a tolerance of ±0.05 mm. The quality team runs the
study and gets a Cpk of 1.1 — barely passing. Someone in engineering
decides to “review” the tolerance. After a meeting and a few emails, the
tolerance is opened up to ±0.10 mm. The Cpk jumps to 2.2. The customer
is reassured. The process hasn’t changed. The parts haven’t improved.
The capability hasn’t increased by a single sigma. You just widened the
goalposts.
This happens more often than anyone admits, and it’s particularly
insidious because it’s technically legitimate — engineering changes
are allowed, and sometimes tolerances genuinely are
over-specified. But when the motivation is “we need a better Cpk number
for the audit” rather than “this tolerance doesn’t reflect functional
requirements,” the practice corrupts the entire capability
framework.
What a Real Capability
Study Looks Like
A genuine process capability study requires four things, and omitting
any one of them produces a number that’s worse than useless — it
produces a number that actively misleads.
First, representative data. Not thirty golden parts
from a fresh tool. You need data that spans the full range of production
conditions: multiple material lots, multiple operators, multiple tool
states (new, middle-aged, end-of-life), multiple shifts, environmental
variation. For a high-volume process, this means weeks or months of
data, not an afternoon. The sample size should be at least 100 data
points, ideally 200 or more, collected using a sampling plan that
deliberately captures known sources of variation rather than avoiding
them.
Second, verified normality. Not a rubber-stamp
normality test. Use multiple tools: Anderson-Darling, Shapiro-Wilk, a
normal probability plot, and — critically — your own eyes. Look at the
histogram. Does it look normal? Look at the probability plot. Do the
points track the line? Are there patterns — curvature, gaps, outliers —
that suggest a non-normal distribution? If the data isn’t normal, use a
non-normal capability analysis (Box-Cox transformation, Johnson
transformation, or a distribution-specific approach like Weibull or
lognormal). Don’t just run the standard formula and hope.
Third, rational subgrouping. Your subgrouping
strategy should reflect the question you’re actually asking. If you want
to know whether the process can hold tolerance over a full production
run, you need long-term data. If you want to understand short-term
potential, use short-term data — but label it honestly. Report both. Let
the gap between them tell you how much your process is drifting, and use
that information to fix the drift rather than hide it.
Fourth, a stability check. This is the prerequisite
that almost everyone skips: your process must be in statistical control
before you can compute a meaningful capability index. If your process is
unstable — if it has special causes acting on it — then capability
indices are meaningless. You can’t predict the output of an unstable
process. Run the control chart first. If there are out-of-control
points, investigate and eliminate the special causes. Then — and only
then — run the capability analysis.
The Culture Behind the
Numbers
The technical errors in capability analysis all stem from the same
cultural root: organizations that treat capability indices as report
cards rather than diagnostic tools.
When Cpk is a number you need for a customer audit, the incentive
structure is clear. You need it to be high. You need it to be
impressive. You need it to prove that your process is capable. The
number becomes a deliverable, not an insight. And when the number is the
deliverable, every step of the analysis — sampling, normality testing,
subgrouping, specification setting — is subtly (or not so subtly)
optimized to produce the desired result.
The alternative culture is harder to build but far more valuable. In
this culture, capability indices are conversations. A Cpk of 1.1 isn’t a
failure — it’s information. It tells you where the process is
struggling, where you need to invest, what problems to prioritize. A Cpk
of 2.0 isn’t a triumph — it’s an opportunity to ask whether you’re
over-engineering, whether you could reduce cost by loosening controls,
whether the excess capability could be redirected to increase
throughput.
In this culture, the honesty of the number matters more than the
magnitude of the number. A verified, well-collected, properly-analyzed
Cpk of 1.15 is infinitely more valuable than a fabricated,
cherry-picked, mathematically-abused Cpk of 2.5. The first gives you a
foundation for improvement. The second gives you a false sense of
security that will eventually collapse — usually in front of the
customer.
The Path Back to Honesty
If your organization has been producing inflated capability numbers,
the path back to honesty is uncomfortable but necessary.
Start with your most critical characteristics — the ones tied to
safety, regulatory compliance, or core product performance. Re-run those
capability studies from scratch, using proper sampling plans, verified
distributions, and adequate data windows. Commit to reporting long-term
capability alongside short-term capability. Accept that the numbers will
likely be worse than what you’ve been reporting. That’s the point.
Share the honest numbers with engineering. If the true Cpk is lower
than what was previously reported, that’s a signal for action — not an
excuse to manipulate the data. Identify the sources of long-term
variation that are eating into capability. Launch improvement projects.
Track the Cpk over time as those improvements take hold.
This is what process capability was always meant to be: not a trophy
number for audit season, but a living metric that drives continuous
improvement. The mathematical framework is sound. The indices work. What
fails is not the tool — what fails is the willingness to let the tool
tell you the truth.
Peter Stasko is a Quality Architect with over 25
years of experience in manufacturing quality management, process
improvement, and statistical methods. He has implemented capability
analysis programs across automotive, electronics, and medical device
industries, and has spent decades teaching organizations that an honest
Cpk of 1.2 is worth more than a fabricated Cpk of 2.5.