Ttest pvalue precision hampered by machine epsilon

Key: MATH201
URL: https://issues.apache.org/jira/browse/MATH201
Project: Commons Math
Issue Type: Bug
Affects Versions: 1.2
Reporter: Peter Wyngaard
Priority: Minor
The smallest pvalue returned by TTestImpl.tTest() is the machine epsilon, which is 2.220446E16
with IEEE754 64bit double precision floats.
We found this bug porting some analysis software from R to java, and noticed that the pvalues
did not match up. We believe we've identified why this is happening in commonsmath1.2,
and a possible solution.
Please be gentle, as I am not a statistics expert!
The following method in org.apache.commons.math.stat.inference.TTestImpl currently implements
the following method to calculate the pvalue for a 2sided, 2sample ttest:
protected double tTest(double m1, double m2, double v1, double v2, double n1, double n2)
and it returns:
1.0  distribution.cumulativeProbability(t, t);
at line 1034 in version 1.2.
double cumulativeProbability(double x0, double x1) is implemented by org.apache.commons.math.distribution.AbstractDisstribution,
and returns:
return cumulativeProbability(x1)  cumulativeProbability(x0);
So in essence, the pvalue returned by TTestImpl.tTest() is:
1.0  (cumulativeProbability(t)  cumulativeProbabily(t))
For largeish tstatistics, cumulativeProbabilty(t) can get quite small, and cumulativeProbabilty(t)
can get very close to 1.0. When cumulativeProbability(t) is less than the machine epsilon,
we get pvalues equal to zero because:
1.0  1.0 + 0.0 = 0.0
An alternative calculation for the pvalue of a 2sided, 2sample ttest is:
p = 2.0 * cumulativeProbability(t)
This calculation does not suffer from the machine epsilon problem, and we are now getting
pvalues much smaller than the 2.2E16 limit we were seeing previously.

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.
