In:
http://ocw.mit.edu/courses/mathematics/18443statisticsforapplicationsfall2006/lecturenotes/lecture14.pdf
Take a look at 2 sample KS stats and the relationship to the 1 sample ...
page 88. You already have the 1 sample table.
Cheers,
Ajo
On Sat, Aug 10, 2013 at 9:16 AM, Phil Steitz <phil.steitz@gmail.com> wrote:
> On 8/10/13 8:59 AM, Ajo Fod wrote:
> > This depends on data size. If it fits in memory, a single pass through
> the
> > sorted array to find the biggest differences would suffice.
> >
> > If the data doesn't fit, you probably need a StorelessQuantile estimator
> > like QuantileBin1D from the colt libraries. Then pick a resolution and do
> > the single pass search.
>
> Thanks, Ajo. I have no problem computing the D_n,m statistics. My
> problem is in computing the exact pvalues for the test. For that,
> I need to compute the exact distribution of D_n,m. Bruteforcing
> requires that you examine every element of n + m choose n. R seems
> to use a clever approach, but there is no documentation in the R
> sources on how the algorithm works. Moreover, my first attempts at
> Monte Carlo simulation don't give the same results. Most likely, I
> have not set the simulation up correctly. Any better ideas or
> references on how to compute the exact distribution would be
> appreciated.
>
> Phil
> >
> > Cheers,
> > Ajo
> >
> >
> > On Sat, Jul 20, 2013 at 10:01 AM, Phil Steitz <phil.steitz@gmail.com>
> wrote:
> >
> >> I am working on MATH437 (turning KS distribution into a proper KS
> >> test impl) and have to decide how to implement 2sample tests.
> >> Asymptotically, the 2sample D_n,m test statistic (see [1]) has a
> >> KS distribution, so for large samples just using the cdf we already
> >> have is appropriate. For small samples (actually for any size
> >> sample), the test statistic distribution is discrete and can be
> >> computed exactly. A brute force way to do that is to enumerate all
> >> of the nm partitions of {0, ..., n+m1} and compute all the
> >> possible D_n,m values. R seems to use a more clever way to do
> >> this. Does anyone have a reference for an efficient way to compute
> >> the exact distribution, or background on where R got their
> >> implementation?
> >>
> >> Absent a "clever" approach, I see three alternatives and would
> >> appreciate some feedback:
> >>
> >> 0) just use the asymptotic distribution even for small samples
> >> 1) fully enumerate all nm partitions and compute the D_n,m as above
> >> 1) use a monte carlo approach  instead of full enumeration of the
> >> D_n,m, randomly generate a large number of splits and compute the
> >> pvalue for observed D_n,m by computing the number of random nm
> >> splits generate a D value less than what is observed.
> >>
> >> Thanks in advance for any feedback / pointers.
> >>
> >> Phil
> >>
> >> [1] http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
> >>
> >> 
> >> To unsubscribe, email: devunsubscribe@commons.apache.org
> >> For additional commands, email: devhelp@commons.apache.org
> >>
> >>
>
>
> 
> To unsubscribe, email: devunsubscribe@commons.apache.org
> For additional commands, email: devhelp@commons.apache.org
>
>
