Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 21C5D200B71 for ; Wed, 31 Aug 2016 21:38:24 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 20254160AA7; Wed, 31 Aug 2016 19:38:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6276B160AB4 for ; Wed, 31 Aug 2016 21:38:23 +0200 (CEST) Received: (qmail 84003 invoked by uid 500); 31 Aug 2016 19:38:22 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 83974 invoked by uid 99); 31 Aug 2016 19:38:22 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Aug 2016 19:38:22 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 087462C1B79 for ; Wed, 31 Aug 2016 19:38:22 +0000 (UTC) Date: Wed, 31 Aug 2016 19:38:22 +0000 (UTC) From: "Kexin Xie (JIRA)" To: issues@commons.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (MATH-1381) BinomialTest P-value > 1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 31 Aug 2016 19:38:24 -0000 [ https://issues.apache.org/jira/browse/MATH-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453154#comment-15453154 ] Kexin Xie edited comment on MATH-1381 at 8/31/16 7:37 PM: ---------------------------------------------------------- Hi [~erans], thanks for looking at the PR. I agree with you that this does seems like it's a dirty fix and mask a potential bug in the computation. However, the main problem here is that there is one corner case that the current algorithm did not consider. Which is that if the probability is large enough and the success is the same as the number of trials and both numbers are small enough, it will cause the {{criticalValueLow}} to rise too quickly and be the same as {{criticalValueHigh}}. The if condition in L138 is suppose to check the symmetry case when {{pLow == pHigh}}, but is not for the case when {{criticalValueLow == criticalValueHigh}}. At that point the probability will always jump to above 1, but it should really be 1 because {{criticalLow}} is the same as {{criticalHigh}} already (maybe I should return 1 there?). It may seem like a dirty fix, but I have checked against results in R, and Python's scipy equivalent, and they produce the same value. I implemented this way because it actually works in handling this boundary condition, and it's the least change to the original implementation. Note that Python's scipy also uses a similar approach to deal with estimated value rising above 1 https://github.com/scipy/scipy/blob/v0.14.0/scipy/stats/morestats.py#L1661 I've also updated with more exhaustive test cases, please have a look again. Also I think the current implementation is correct as explained above, but I'm happy to change the estimation algorithm if that's required. was (Author: kexinxie): Hi [~erans], thanks for looking at the PR. I agree with you that this does seems like it's a dirty fix and mask a potential bug in the computation. However, the main problem here is that there is one corner case that the current algorithm did not consider. Which is that if the probability is large enough and the success is the same as the number of trials and both numbers are small enough, it will cause the {{criticalValueLow}} to rise too quickly and be the same as {{criticalValueHigh}}. The if condition in L138 is suppose to check the symmetry case when {{pLow == pHigh}}, but is not for the case when {{criticalValueLow == criticalValueHigh}}. At that point the probability will always jump to above 1. It may seem like a dirty fix, but I have checked against results in R, and Python's scipy equivalent, and they produce the same value. I implemented this way because it actually works in handling this boundary condition, and it's the least change to the original implementation. Note that Python's scipy also uses a similar approach to deal with estimated value rising above 1 https://github.com/scipy/scipy/blob/v0.14.0/scipy/stats/morestats.py#L1661 I've also updated with more exhaustive test cases, please have a look again. Also I think the current implementation is correct as explained above, but I'm happy to change the estimation algorithm if that's required. > BinomialTest P-value > 1 > ------------------------ > > Key: MATH-1381 > URL: https://issues.apache.org/jira/browse/MATH-1381 > Project: Commons Math > Issue Type: Bug > Reporter: Wang Qiang > > When I use the Binomial Test, I got p-value > 1 for two sided check. > Example: > (new BinomialTest()).binomialTest(200, 200, 0.9950429, AlternativeHypothesis.TWO_SIDED) == 1.3701357550780435 > In my case, if the expected p-value is 1 (calculated by package in other language, scipy in this case), the p-value returned could be > 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)