# commons-dev mailing list archives

##### Site index · List index
Message view
Top
From "Phil Steitz" <p...@steitz.com>
Subject Re: [math] UnivariateImpl - when sumsq ~ xbar*xbar*((double) n)
Date Tue, 03 Jun 2003 04:57:56 GMT
```mdiggory@latte.harvard.edu wrote:
> Phil Steitz wrote:
>
>>Since xbar = sum/n, the change has no impact on the which sums are
>>computed or squared. Instead of (sum/n)*(sum/n)*n your change just
>>computes sum**2/n.  The difference is that you are a) eliminating one
>>division by n and one multiplication by n (no doubt a good thing) and b)
>>replacing direct multiplication with pow(-,2). The second of these used
>>to be discouraged, but I doubt it makes any difference with modern
>>compilers.  I would suggest collapsing the denominators and doing just
>>one cast -- i.e., use
>>
>>(1) variance = sumsq - sum * (sum/(double) (n * (n - 1)))
>>
>>If
>>
>>(2) variance = sumsq - (sum * sum)/(double) (n * (n - 1))) or
>>
>>(3) variance = sumsq - Math.pow(sum,2)/(double) (n * (n - 1))) give
>>
>>better accuracy, use one of them; but I would favor (1) since it will be
>>able to handle larger positive sums.
>>
>>I would also recommend forcing getVariance() to return 0 if the result
>>is negative (which can happen in the right circumstances for any of
>>these formulas).
>>
>>Phil
>
>
> collapsing is definitely good, but I'm not sure about these equations, from my
> experience, approaching (2) would look something more like
>
> variance = (((double)n)*sumsq - (sum * sum)) / (double) (n * (n - 1));
>
> see (5) in http://mathworld.wolfram.com/k-Statistic.html

That formula is the formula for the 2nd k-statistic, which is *not* the
same as the sample variance.  The standard formula for the sample
variance is presented in equation (3) here:
http://mathworld.wolfram.com/SampleVariance.html or in any elementary
statistics text. Formulas (1)-(3) above (and the current implementation)
are all equivalent to the standard defintion.

What you have above is not.  The relation between the variance and the
second k-statistic is presented in (9) on
http://mathworld.wolfram.com/k-Statistic.html

>
> As you've stated, this approach seems to have more than just one benifit. I'll
> also place in a test for negitive values and return 0.0 if they are present.
>
> -Mark
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

```
Mime
View raw message