# commons-dev mailing list archives

##### Site index · List index
Message view
Top
From mdigg...@latte.harvard.edu
Subject Re: [math] UnivariateImpl - when sumsq ~ xbar*xbar*((double) n)
Date Tue, 03 Jun 2003 04:41:29 GMT
```
Phil Steitz wrote:
> Since xbar = sum/n, the change has no impact on the which sums are
> computes sum**2/n.  The difference is that you are a) eliminating one
> division by n and one multiplication by n (no doubt a good thing) and b)
> replacing direct multiplication with pow(-,2). The second of these used
> to be discouraged, but I doubt it makes any difference with modern
> compilers.  I would suggest collapsing the denominators and doing just
> one cast -- i.e., use
>
> (1) variance = sumsq - sum * (sum/(double) (n * (n - 1)))
>
> If
>
> (2) variance = sumsq - (sum * sum)/(double) (n * (n - 1))) or
>
> (3) variance = sumsq - Math.pow(sum,2)/(double) (n * (n - 1))) give
>
> better accuracy, use one of them; but I would favor (1) since it will be
> able to handle larger positive sums.
>
> I would also recommend forcing getVariance() to return 0 if the result
> is negative (which can happen in the right circumstances for any of
> these formulas).
>
> Phil

collapsing is definitely good, but I'm not sure about these equations, from my
experience, approaching (2) would look something more like

variance = (((double)n)*sumsq - (sum * sum)) / (double) (n * (n - 1));

see (5) in http://mathworld.wolfram.com/k-Statistic.html

As you've stated, this approach seems to have more than just one benifit. I'll
also place in a test for negitive values and return 0.0 if they are present.

-Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org