[ https://issues.apache.org/jira/browse/MATH1381?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=15453154#comment15453154
]
Kexin Xie edited comment on MATH1381 at 8/31/16 7:38 PM:

Hi [~erans], thanks for looking at the PR. I agree with you that this does seems like it's
a dirty fix and mask a potential bug in the computation.
However, the main problem here is that there is one corner case that the current algorithm
did not consider. Which is that if the probability is large enough and the success is the
same as the number of trials and both numbers are small enough, it will cause the {{criticalValueLow}}
to rise too quickly and be the same as {{criticalValueHigh}}. The if condition in L138 is
suppose to check the symmetry case when {{pLow == pHigh}}, but is not for the case when {{criticalValueLow
== criticalValueHigh}}. At that point the probability will always jump to above 1, but it
should really be 1 because {{criticalLow}} is the same as {{criticalHigh}} already (maybe
I should return 1 there?).
It may seem like a dirty fix, but I have checked against results in R, and Python's scipy
equivalent, and they produce the same value. I implemented this way because it actually works
in handling this boundary condition, and it's the least change to the original implementation.
Note that Python's scipy also uses a similar approach to deal with estimated value rising
above 1 https://github.com/scipy/scipy/blob/v0.14.0/scipy/stats/morestats.py#L1661
I've also updated the PR with more exhaustive test cases, please have a look again. Also I
think the current implementation is correct as explained above, but I'm happy to change the
estimation algorithm if that's required.
was (Author: kexinxie):
Hi [~erans], thanks for looking at the PR. I agree with you that this does seems like it's
a dirty fix and mask a potential bug in the computation.
However, the main problem here is that there is one corner case that the current algorithm
did not consider. Which is that if the probability is large enough and the success is the
same as the number of trials and both numbers are small enough, it will cause the {{criticalValueLow}}
to rise too quickly and be the same as {{criticalValueHigh}}. The if condition in L138 is
suppose to check the symmetry case when {{pLow == pHigh}}, but is not for the case when {{criticalValueLow
== criticalValueHigh}}. At that point the probability will always jump to above 1, but it
should really be 1 because {{criticalLow}} is the same as {{criticalHigh}} already (maybe
I should return 1 there?).
It may seem like a dirty fix, but I have checked against results in R, and Python's scipy
equivalent, and they produce the same value. I implemented this way because it actually works
in handling this boundary condition, and it's the least change to the original implementation.
Note that Python's scipy also uses a similar approach to deal with estimated value rising
above 1 https://github.com/scipy/scipy/blob/v0.14.0/scipy/stats/morestats.py#L1661
I've also updated with more exhaustive test cases, please have a look again. Also I think
the current implementation is correct as explained above, but I'm happy to change the estimation
algorithm if that's required.
> BinomialTest Pvalue > 1
> 
>
> Key: MATH1381
> URL: https://issues.apache.org/jira/browse/MATH1381
> Project: Commons Math
> Issue Type: Bug
> Reporter: Wang Qiang
>
> When I use the Binomial Test, I got pvalue > 1 for two sided check.
> Example:
> (new BinomialTest()).binomialTest(200, 200, 0.9950429, AlternativeHypothesis.TWO_SIDED)
== 1.3701357550780435
> In my case, if the expected pvalue is 1 (calculated by package in other language, scipy
in this case), the pvalue returned could be > 1

This message was sent by Atlassian JIRA
(v6.3.4#6332)
