Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov (K-S) test is a nonparametric goodness-of-fit test that measures the maximum distance between an empirical cumulative distribution function and a reference distribution. It determines whether a sample plausibly comes from a specified distribution.

Why It Matters

The K-S test is a critical validation tool in quality analytics. Before computing Cpk or setting control limits, engineers need to verify that the assumed distribution actually fits the data. A poor fit means every downstream calculation is unreliable.

Unlike the chi-square goodness-of-fit test, the K-S test does not require binning the data into arbitrary intervals, making it more sensitive to subtle distributional differences. It works with continuous data of any sample size, though its power increases with more observations.

In practice, the K-S test is used both for validating parametric fits (e.g., "Is this data really normal?") and for comparing two empirical distributions (e.g., "Has the process changed between these two batches?"). The two-sample variant is especially useful for process drift detection.

The EntropyStat Perspective

EntropyStat uses the K-S test as its primary validation mechanism — but in reverse compared to traditional usage. Instead of testing whether data fits a pre-assumed distribution (like normal or Weibull), EntropyStat uses the K-S test to validate the quality of the EGDF fit itself.

The workflow is: fit the EGDF to data, then apply the K-S test between the fitted EGDF and the empirical CDF. A high p-value confirms that the entropy-based distribution accurately captures the data's true shape. This is a strictly more powerful validation strategy because the EGDF can represent any distribution form — if the K-S test passes, the fit is genuinely good, not just "good for a normal" or "good for a Weibull."

The K-S statistic also guides EntropyStat's automatic Scale parameter optimization. The Scale parameter controls EGDF smoothness, and the optimal Scale is the one that minimizes the K-S statistic — finding the balance between overfitting (too jagged) and underfitting (too smooth).

Related Terms

EGDF (Entropic Global Distribution Function)

The EGDF is Machine Gnostics' primary distribution estimation method. It constructs a smooth, continuous cumulative distribution function directly from data using entropy-based algebraic optimization, without assuming any parametric form such as normal or Weibull.

Distribution Fitting

Distribution fitting is the process of finding a probability distribution that best describes a dataset. Traditional methods involve selecting a parametric family (normal, Weibull, lognormal) and estimating its parameters, then validating the fit with a goodness-of-fit test.

Normal Distribution

The normal (Gaussian) distribution is a symmetric, bell-shaped probability distribution fully described by its mean and standard deviation. It is the foundational assumption behind most classical statistical quality methods, including Cpk, Shewhart charts, and Six Sigma calculations.

Homogeneity Testing

Homogeneity testing determines whether a dataset comes from a single statistical population or contains multiple subpopulations. In manufacturing, non-homogeneous data indicates that the process was not operating in a single stable mode during data collection.

Assumption-Free Statistics

Assumption-free statistics are methods that do not require data to follow a specific probability distribution (like normal, Weibull, or exponential). They derive results directly from the data structure using algebraic and geometric principles rather than probabilistic models with parametric assumptions.

The Distribution Fitting Trap: Weibull, Lognormal, or None of the Above?

Distribution fitting replaces the normality assumption with a different guess. With typical sample sizes, Weibull, lognormal, and gamma all pass goodness-of-fit tests — giving different Cpk values. The distribution fitting step that should fix your analysis becomes its own error source.

Mar 13, 2026

Process Drift Detection Without False Alarms

Process drift hides under false alarms. Shewhart charts catch sudden shifts but miss gradual process drift — while Nelson rules fire on stable data. Entropy-based homogeneity testing separates real drift from noise without chart configuration.

Mar 12, 2026

Hidden Clusters in Your Process Data — and Why Cpk Hides Them

Hidden clusters from multi-cavity molds, shift changes, and material lots produce aggregate Cpk that looks capable — while one subpopulation ships defects. ELDF detects what Cpk can’t see.

Mar 11, 2026

See Entropy-Powered Analysis in Action

Upload your data and compare traditional SPC with entropy-based methods. Free demo — no credit card required.

Try the Demo Book a Consultation