Skip to main content

Outlier Detection

Outlier detection identifies data points that deviate significantly from the expected pattern of a dataset. In manufacturing, outliers may indicate measurement errors, tooling failures, material defects, or genuine process excursions that require investigation.

Why It Matters

Outliers are the most contentious topic in quality data analysis. Removing a genuine process excursion hides a real problem. Keeping a measurement error inflates variation estimates and depresses capability indices. The decision has direct business impact — a Cpk that drops below 1.33 because of one bad reading can trigger a formal corrective action request from a customer.

Traditional outlier detection methods (Grubbs' test, Dixon's Q test, IQR fences) all assume the "good" data is normally distributed. When the underlying process is non-normal, these tests either miss real outliers or flag legitimate data points as anomalous.

The fundamental challenge is that outlier detection requires knowing the expected distribution — but knowing the distribution requires having clean data without outliers. This circular dependency makes outlier handling one of the most error-prone steps in traditional quality analytics.

The EntropyStat Perspective

EntropyStat breaks the outlier circularity by using methods that are inherently robust to outliers without needing to identify and remove them first. The EGDF uses supremum-based optimization, which means that extreme values do not distort the fitted distribution the way they distort least-squares-based methods.

In practice, this means EntropyStat skips the detect-remove-refit cycle entirely. The EGDF is fitted once, on the full dataset including potential outliers, and the result is a distribution estimate dominated by the dense core of the data. Outlier identification becomes a downstream diagnostic rather than an upstream prerequisite — the fitted EGDF can identify which points lie in the extreme tails, but the fit itself is not corrupted by their presence.

For cases where explicit outlier identification is needed (e.g., regulatory documentation), the ELDF provides additional resolution. Points that fall between ELDF-detected clusters — belonging to neither the main distribution nor a coherent secondary cluster — are genuine outlier candidates. This geometric identification is more principled than threshold-based rules that depend on normality assumptions.

Related Terms

See Entropy-Powered Analysis in Action

Upload your data and compare traditional SPC with entropy-based methods. Free demo — no credit card required.