commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajo Fod <>
Subject Re: [math] Kolmogorov-Smirnov 2-sample test
Date Sat, 10 Aug 2013 15:59:48 GMT
This depends on data size. If it fits in memory, a single pass through the
sorted array to find the biggest differences would suffice.

If the data doesn't fit, you probably need a StorelessQuantile estimator
like QuantileBin1D from the colt libraries. Then pick a resolution and do
the single pass search.


On Sat, Jul 20, 2013 at 10:01 AM, Phil Steitz <> wrote:

> I am working on MATH-437 (turning K-S distribution into a proper K-S
> test impl) and have to decide how to implement 2-sample tests.
> Asymptotically, the 2-sample D_n,m test statistic (see [1]) has a
> K-S distribution, so for large samples just using the cdf we already
> have is appropriate.  For small samples (actually for any size
> sample), the test statistic distribution is discrete and can be
> computed exactly.  A brute force way to do that is to enumerate all
> of the n-m partitions of {0, ..., n+m-1} and compute all the
> possible D_n,m values.  R seems to use a more clever way to do
> this.  Does anyone have a reference for an efficient way to compute
> the exact distribution, or background on where R got their
> implementation?
> Absent a "clever" approach, I see three alternatives and would
> appreciate some feedback:
> 0) just use the asymptotic distribution even for small samples
> 1) fully enumerate all n-m partitions and compute the D_n,m as above
> 1) use a monte carlo approach - instead of full enumeration of the
> D_n,m, randomly generate a large number of splits and compute the
> p-value for observed D_n,m by computing the number of random n-m
> splits generate a D value less than what is observed.
> Thanks in advance for any feedback / pointers.
> Phil
> [1]
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message