commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject [math] Kolmogorov-Smirnov 2-sample test
Date Sat, 20 Jul 2013 17:01:42 GMT
I am working on MATH-437 (turning K-S distribution into a proper K-S
test impl) and have to decide how to implement 2-sample tests. 
Asymptotically, the 2-sample D_n,m test statistic (see [1]) has a
K-S distribution, so for large samples just using the cdf we already
have is appropriate.  For small samples (actually for any size
sample), the test statistic distribution is discrete and can be
computed exactly.  A brute force way to do that is to enumerate all
of the n-m partitions of {0, ..., n+m-1} and compute all the
possible D_n,m values.  R seems to use a more clever way to do
this.  Does anyone have a reference for an efficient way to compute
the exact distribution, or background on where R got their
implementation?

Absent a "clever" approach, I see three alternatives and would
appreciate some feedback:

0) just use the asymptotic distribution even for small samples
1) fully enumerate all n-m partitions and compute the D_n,m as above
1) use a monte carlo approach - instead of full enumeration of the
D_n,m, randomly generate a large number of splits and compute the
p-value for observed D_n,m by computing the number of random n-m
splits generate a D value less than what is observed.

Thanks in advance for any feedback / pointers.

Phil

[1] http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message