commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 36215] - [Math] HypergeometricDistributionImpl cumulativeProbability calculation overflown
Date Fri, 26 Aug 2005 15:39:53 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=36215>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=36215





------- Additional Comments From microarray@gmail.com  2005-08-26 17:39 -------
OKay, for the love of Java, let's see the details of all the bugs. I tend to
believe they're not all caught yet, but if I have the jar I can definitely test
them out. 

for the current build 8-25-05, the code below
HypergeometricDistributionImpl hDist = new HypergeometricDistributionImpl(26932,
270, 823);

        double probability = hDist.probability(52);
        double utprobability = hDist.upperCumulativeProbability(52);
        double cprobability = hDist.cumulativeProbability(52);

        System.out.println(probability);
        System.out.println(utprobability);
        System.out.println(cprobability);
        System.out.println(1 - cprobability);
will return:
1.018427824183987E-26
-2.437485768780334E-10
1.0000000002437486
-2.437485768780334E-10
this shows that in this example, the upper tail cumPro is just 1 - cumPro, which
is wrong because of the numeric error accumulation we discusse earlier.

but this is obviously not always happening, as when I tried 6000 200 100 50, it
returned 6.02E-49, as I expected. what's the mystery here?

furthermore, if you run the above code but change it so that it runs
HypergeometricDistributionImpl hDist = new HypergeometricDistributionImpl(26932,
823, 270);
the result will be different:
1.0184278236099406E-26
3.7159353372118176E-10
0.99999999962840647
3.7159353372118176E-10
but by textbook definition of hypergeometric distribution, the order of
numSuccess and sample shouldn't matter at all, notice here: A. the raw
probability is slightly different, at the 9th digit; B. this should have
generated what the earlier calculation had, but it's totally different.

so, summarizing the bugs here:
1. upper tail cumulative probability when running some examples, will overflow
and give negative values;
2. upper tail cumPro obviously is not consistently calculating in the same
fashion, sometimes it works, sometimes not;
3. the order of numSuccess and sample sometimes matters to the code, while it
should not, ever;
4. the raw probability, when change the order of numSuccess and sample, will
differ slightly, if it's calculated in exactly the same way then it shouldn't.

I suggest that when you have a fix, test it out further with many more examples,
like what we suggested here, and in reversing order, etc. and if you could send
me the jar or post it here I can test it out as well, rather than waiting for
the nightly build to include it. 

pop numSuccess sample query upperCumPro
26932 823 270 53 1.4160591836816684E-27
26932 270 823 53
6000 200 100 50 6.020909331626761E-49
6000 100 200 50
26896 895 55 15 2.077516591801479E-10
26896 55 895 15

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message