Outlier Detection

Outlier detection identifies data points that deviate significantly from the expected pattern of a dataset. In manufacturing, outliers may indicate measurement errors, tooling failures, material defects, or genuine process excursions that require investigation.

Why It Matters

Outliers are the most contentious topic in quality data analysis. Removing a genuine process excursion hides a real problem. Keeping a measurement error inflates variation estimates and depresses capability indices. The decision has direct business impact — a Cpk that drops below 1.33 because of one bad reading can trigger a formal corrective action request from a customer.

Traditional outlier detection methods (Grubbs' test, Dixon's Q test, IQR fences) all assume the "good" data is normally distributed. When the underlying process is non-normal, these tests either miss real outliers or flag legitimate data points as anomalous.

The fundamental challenge is that outlier detection requires knowing the expected distribution — but knowing the distribution requires having clean data without outliers. This circular dependency makes outlier handling one of the most error-prone steps in traditional quality analytics.

The EntropyStat Perspective

EntropyStat breaks the outlier circularity by using methods that are inherently robust to outliers without needing to identify and remove them first. The EGDF uses supremum-based optimization, which means that extreme values do not distort the fitted distribution the way they distort least-squares-based methods.

In practice, this means EntropyStat skips the detect-remove-refit cycle entirely. The EGDF is fitted once, on the full dataset including potential outliers, and the result is a distribution estimate dominated by the dense core of the data. Outlier identification becomes a downstream diagnostic rather than an upstream prerequisite — the fitted EGDF can identify which points lie in the extreme tails, but the fit itself is not corrupted by their presence.

For cases where explicit outlier identification is needed (e.g., regulatory documentation), the ELDF provides additional resolution. Points that fall between ELDF-detected clusters — belonging to neither the main distribution nor a coherent secondary cluster — are genuine outlier candidates. This geometric identification is more principled than threshold-based rules that depend on normality assumptions.

Related Terms

Robust Statistics

Robust statistics are methods that remain reliable even when data contains outliers, contamination, or deviations from assumed distributions. They provide stable estimates where classical methods (like the mean and standard deviation) would be significantly distorted.

EGDF (Entropic Global Distribution Function)

The EGDF is Machine Gnostics' primary distribution estimation method. It constructs a smooth, continuous cumulative distribution function directly from data using entropy-based algebraic optimization, without assuming any parametric form such as normal or Weibull.

ELDF (Entropic Local Distribution Function)

The ELDF is Machine Gnostics' local distribution analysis method. While the EGDF provides a global view of the entire distribution, the ELDF focuses on local structure — revealing peaks, clusters, and multimodal features hidden within the data.

8D Problem Solving

8D is a structured eight-discipline problem-solving methodology used in manufacturing to identify root causes, implement corrective actions, and prevent recurrence. It is widely required by automotive OEMs for formal customer complaint responses.

Cluster Detection

Cluster detection in quality analytics identifies distinct subgroups (modes) within process data. Unlike outlier detection, which flags individual extreme points, cluster detection finds coherent subpopulations that may have different means, variances, or distribution shapes.

Hidden Clusters in Your Process Data — and Why Cpk Hides Them

Hidden clusters from multi-cavity molds, shift changes, and material lots produce aggregate Cpk that looks capable — while one subpopulation ships defects. ELDF detects what Cpk can’t see.

Mar 11, 2026

EntropyStat vs. Minitab: What Distribution-Free Analysis Actually Means

Minitab offers non-normal options. EntropyStat is distribution-free. Those aren’t the same thing. Offering a menu of distributions to choose from is distribution-flexible — not distribution-free. Here’s why that distinction determines whether your Cpk is correct.

Mar 10, 2026

Small Sample Capability: How to Trust Cpk With Only 10 Parts

With a small sample of 10 parts, traditional Cpk has a confidence interval 0.6 units wide — your 1.38 could be anywhere from 1.05 to 1.71. Entropy-based methods extract more from limited data without the normality assumption.

Mar 7, 2026

See Entropy-Powered Analysis in Action

Upload your data and compare traditional SPC with entropy-based methods. Free demo — no credit card required.

Try the Demo Book a Consultation