Return-Path: X-Original-To: apmail-commons-issues-archive@minotaur.apache.org Delivered-To: apmail-commons-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3BF0818514 for ; Tue, 15 Sep 2015 14:00:52 +0000 (UTC) Received: (qmail 8515 invoked by uid 500); 15 Sep 2015 14:00:51 -0000 Delivered-To: apmail-commons-issues-archive@commons.apache.org Received: (qmail 8411 invoked by uid 500); 15 Sep 2015 14:00:51 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 8387 invoked by uid 99); 15 Sep 2015 14:00:51 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Sep 2015 14:00:51 +0000 Date: Tue, 15 Sep 2015 14:00:51 +0000 (UTC) From: "Otmar Ertl (JIRA)" To: issues@commons.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (MATH-1246) Kolmogorov-Smirnov 2-sample test does not correctly handle ties MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MATH-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745495#comment-14745495 ] Otmar Ertl edited comment on MATH-1246 at 9/15/15 2:00 PM: ----------------------------------------------------------- After some research I have the feeling we are discussing how to define zero divided by zero. There are at least two methods to calculate a reasonable p-value in the presence of ties: # The method you have proposed which seems to be also known as permutation method. Averaging only over some permutations and averaging over all possible permutations correspond to the bootstrap method and the current exactP() implementation, respectively. # Another method is to add some jitter to the sampled values to break ties. (This google search https://www.google.com/?gfe_rd=cr&ei=qCL4VaKvNIWI8QfLibD4Bg&gws_rd=cr&fg=1#q=jitter+kolmogorov+smirnov immediately gives you a couple of references.) This method corresponds to the method I have proposed. Adding small random values to ties to get a strict ordering corresponds to choosing any random ordering. Averaging over all possible orderings would also lead to a well-defined p-value. Maybe, the user should be able to choose the method how to resolve ties? was (Author: otmar ertl): After some research I have the feeling we are discussing how to define zero divided by zero. There are at least two methods to calculate a reasonable p-value in the presence of ties: # The method you have proposed which seems to be also known as permutation method. Averaging only over some permutations and averaging over all possible permutations correspond to the bootstrap method and the current exactP() implementation, respectively. # Another method is to add some jitter to the sampled values to break ties. (This google search https://www.google.com/?gfe_rd=cr&ei=qCL4VaKvNIWI8QfLibD4Bg&gws_rd=cr&fg=1#q=jitter+kolmogorov+smirnov immediately gives you a couple of references.) This method corresponds to the method I have proposed. Adding small random values to ties to get a strict ordering corresponds to choosing any random ordering. Averaging over all possible orderings would also lead to a well-defined p-value. Maybe, the user should be able to choose the method how to resolve ties? > Kolmogorov-Smirnov 2-sample test does not correctly handle ties > --------------------------------------------------------------- > > Key: MATH-1246 > URL: https://issues.apache.org/jira/browse/MATH-1246 > Project: Commons Math > Issue Type: Bug > Reporter: Phil Steitz > > For small samples, KolmogorovSmirnovTest(double[], double[]) computes the distribution of a D-statistic for m-n sets with no ties. No warning or special handling is delivered in the presence of ties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)