Cluster Detection

Cluster detection in quality analytics identifies distinct subgroups (modes) within process data. Unlike outlier detection, which flags individual extreme points, cluster detection finds coherent subpopulations that may have different means, variances, or distribution shapes.

Why It Matters

Multimodal data is surprisingly common in manufacturing. Two tool inserts with slightly different geometry produce two clusters of measurements. A thermal drift creates a gradual transition between two operating modes. Material from two different suppliers creates distinct populations within the same production run.

When clusters are present, every single-distribution statistic — mean, standard deviation, Cpk, control limits — is wrong. The mean falls between the clusters (where no data actually exists), the standard deviation is inflated by the inter-cluster distance, and control limits are too wide to detect shifts within either cluster.

Traditional cluster detection methods (k-means, Gaussian mixture models) require specifying the number of clusters in advance and assume each cluster is normally distributed. In practice, quality engineers often do not know how many clusters to expect, and the clusters may not be Gaussian.

The EntropyStat Perspective

EntropyStat's ELDF (Entropic Local Distribution Function) detects clusters as a natural byproduct of local distribution analysis. Unlike k-means or GMM, the ELDF does not require specifying the number of clusters in advance — it discovers them from the local density structure of the data. And unlike GMM, it does not assume each cluster is normally distributed.

The detection works by analyzing the ELDF's density structure. Peaks in the local density correspond to cluster centers, and valleys correspond to boundaries between clusters. This topographic approach is robust to cluster shape: overlapping clusters, clusters of different sizes, and non-elliptical clusters are all detected naturally.

Once clusters are detected, EntropyStat fits a separate EGDF to each one. This means you get per-cluster capability indices, per-cluster control limits, and per-cluster tolerance intervals — all computed with the same assumption-free entropy methods that handle the global analysis. The engineer sees not just "your data has two clusters" but a complete analytical profile of each subpopulation.

Related Terms

Homogeneity Testing

Homogeneity testing determines whether a dataset comes from a single statistical population or contains multiple subpopulations. In manufacturing, non-homogeneous data indicates that the process was not operating in a single stable mode during data collection.

ELDF (Entropic Local Distribution Function)

The ELDF is Machine Gnostics' local distribution analysis method. While the EGDF provides a global view of the entire distribution, the ELDF focuses on local structure — revealing peaks, clusters, and multimodal features hidden within the data.

Outlier Detection

Outlier detection identifies data points that deviate significantly from the expected pattern of a dataset. In manufacturing, outliers may indicate measurement errors, tooling failures, material defects, or genuine process excursions that require investigation.

8D Problem Solving

8D is a structured eight-discipline problem-solving methodology used in manufacturing to identify root causes, implement corrective actions, and prevent recurrence. It is widely required by automotive OEMs for formal customer complaint responses.

EGDF (Entropic Global Distribution Function)

The EGDF is Machine Gnostics' primary distribution estimation method. It constructs a smooth, continuous cumulative distribution function directly from data using entropy-based algebraic optimization, without assuming any parametric form such as normal or Weibull.

Process Drift Detection Without False Alarms

Process drift hides under false alarms. Shewhart charts catch sudden shifts but miss gradual process drift — while Nelson rules fire on stable data. Entropy-based homogeneity testing separates real drift from noise without chart configuration.

Mar 12, 2026

Hidden Clusters in Your Process Data — and Why Cpk Hides Them

Hidden clusters from multi-cavity molds, shift changes, and material lots produce aggregate Cpk that looks capable — while one subpopulation ships defects. ELDF detects what Cpk can’t see.

Mar 11, 2026

EntropyStat vs. Minitab: What Distribution-Free Analysis Actually Means

Minitab offers non-normal options. EntropyStat is distribution-free. Those aren’t the same thing. Offering a menu of distributions to choose from is distribution-flexible — not distribution-free. Here’s why that distinction determines whether your Cpk is correct.

Mar 10, 2026

See Entropy-Powered Analysis in Action

Upload your data and compare traditional SPC with entropy-based methods. Free demo — no credit card required.

Try the Demo Book a Consultation