commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mdigg...@latte.harvard.edu
Subject Re: [math] UnivariateImpl - when sumsq ~ xbar*xbar*((double) n)
Date Tue, 03 Jun 2003 04:41:29 GMT

Phil Steitz wrote:
> Since xbar = sum/n, the change has no impact on the which sums are 
> computed or squared. Instead of (sum/n)*(sum/n)*n your change just 
> computes sum**2/n.  The difference is that you are a) eliminating one 
> division by n and one multiplication by n (no doubt a good thing) and b) 
> replacing direct multiplication with pow(-,2). The second of these used 
> to be discouraged, but I doubt it makes any difference with modern 
> compilers.  I would suggest collapsing the denominators and doing just 
> one cast -- i.e., use
> 
> (1) variance = sumsq - sum * (sum/(double) (n * (n - 1)))
> 
> If
> 
> (2) variance = sumsq - (sum * sum)/(double) (n * (n - 1))) or
> 
> (3) variance = sumsq - Math.pow(sum,2)/(double) (n * (n - 1))) give
> 
> better accuracy, use one of them; but I would favor (1) since it will be 
> able to handle larger positive sums.
> 
> I would also recommend forcing getVariance() to return 0 if the result 
> is negative (which can happen in the right circumstances for any of 
> these formulas).
> 
> Phil

collapsing is definitely good, but I'm not sure about these equations, from my 
experience, approaching (2) would look something more like

variance = (((double)n)*sumsq - (sum * sum)) / (double) (n * (n - 1));

see (5) in http://mathworld.wolfram.com/k-Statistic.html

As you've stated, this approach seems to have more than just one benifit. I'll 
also place in a test for negitive values and return 0.0 if they are present.

-Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message