Return-Path: X-Original-To: apmail-commons-dev-archive@www.apache.org Delivered-To: apmail-commons-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DF95310731 for ; Sat, 20 Jul 2013 17:02:12 +0000 (UTC) Received: (qmail 26219 invoked by uid 500); 20 Jul 2013 17:02:12 -0000 Delivered-To: apmail-commons-dev-archive@commons.apache.org Received: (qmail 25835 invoked by uid 500); 20 Jul 2013 17:02:10 -0000 Mailing-List: contact dev-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Developers List" Delivered-To: mailing list dev@commons.apache.org Received: (qmail 25827 invoked by uid 99); 20 Jul 2013 17:02:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Jul 2013 17:02:09 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of phil.steitz@gmail.com designates 209.85.220.47 as permitted sender) Received: from [209.85.220.47] (HELO mail-pa0-f47.google.com) (209.85.220.47) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Jul 2013 17:02:04 +0000 Received: by mail-pa0-f47.google.com with SMTP id kl14so5553665pab.34 for ; Sat, 20 Jul 2013 10:01:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=jLRMPBVA2T4y1udf/STFrJhsu7PRpPjjjEulLpFXToY=; b=LWoDFQNTbvcWZxQ0cXBfKolpW9a26XzREe8goLqLoH5LsZ8r05XIwGKGtvJp8etiAc spkW2oN6kAgsQhIBSku8IJKT35OETSBwzUFiHPsnWewvGcfXKcqMwBremGJSMKnAAHaD MXfPlI64OZ8W82t/KyBB2MCoI0iImMyCaG5wY2lMwc5HMx5O99v0BsDJEKfinRMLURht z1x48CHwqJ/73ASLmPaS4dRZ2tsz7uPn64u2Eu0KJ72t1flHrk2eejAA32ZEMTEzMHUD VBp2u5002gDq76HVhcZtRMHhCQQI2FkQViLEsAcg1T7sKywTehwQMMA4MfQ0NlqI+1RD B4Sg== X-Received: by 10.66.161.166 with SMTP id xt6mr24311336pab.87.1374339704292; Sat, 20 Jul 2013 10:01:44 -0700 (PDT) Received: from [192.168.2.107] (ip72-208-109-243.ph.ph.cox.net. [72.208.109.243]) by mx.google.com with ESMTPSA id cx3sm26041882pbb.30.2013.07.20.10.01.42 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 20 Jul 2013 10:01:43 -0700 (PDT) Message-ID: <51EAC276.3030808@gmail.com> Date: Sat, 20 Jul 2013 10:01:42 -0700 From: Phil Steitz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130620 Thunderbird/17.0.7 MIME-Version: 1.0 To: Commons Developers List Subject: [math] Kolmogorov-Smirnov 2-sample test Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org I am working on MATH-437 (turning K-S distribution into a proper K-S test impl) and have to decide how to implement 2-sample tests. Asymptotically, the 2-sample D_n,m test statistic (see [1]) has a K-S distribution, so for large samples just using the cdf we already have is appropriate. For small samples (actually for any size sample), the test statistic distribution is discrete and can be computed exactly. A brute force way to do that is to enumerate all of the n-m partitions of {0, ..., n+m-1} and compute all the possible D_n,m values. R seems to use a more clever way to do this. Does anyone have a reference for an efficient way to compute the exact distribution, or background on where R got their implementation? Absent a "clever" approach, I see three alternatives and would appreciate some feedback: 0) just use the asymptotic distribution even for small samples 1) fully enumerate all n-m partitions and compute the D_n,m as above 1) use a monte carlo approach - instead of full enumeration of the D_n,m, randomly generate a large number of splits and compute the p-value for observed D_n,m by computing the number of random n-m splits generate a D value less than what is observed. Thanks in advance for any feedback / pointers. Phil [1] http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org For additional commands, e-mail: dev-help@commons.apache.org