From "Thomas Neidhart (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MATH-790) Mann-Whitney U Test Suffers From Integer Overflow With Large Data Sets
Date Tue, 12 Jun 2012 11:52:43 GMT
[ https://issues.apache.org/jira/browse/MATH-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293552#comment-13293552
Thomas Neidhart commented on MATH-790:
As discussed on the ML, there may be still a problem with integer overflow in the code fragment
final double n1n2prod = n1 * n2;

// http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U#Normal_approximation
final double EU = n1n2prod / 2.0;
final double VarU = n1n2prod * (n1 + n2 + 1) / 12.0;

final double z = (Umin - EU) / FastMath.sqrt(VarU);
The calculation of n1n2prod may still overflow if n1 and n2 are too big as it still does an
int multiplication, so I would suggest to do it like that:

final long n1n2prod = (long) n1 * n2;

// http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U#Normal_approximation
final double EU = n1n2prod / 2.0;
final double VarU = n1n2prod * (n1 + n2 + 1) / 12.0;

final double z = (Umin - EU) / FastMath.sqrt(VarU);
