commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Steitz" <p...@steitz.com>
Subject Re: [math] UnivariateImpl - when sumsq ~ xbar*xbar*((double) n)
Date Tue, 03 Jun 2003 05:42:44 GMT
Phil Steitz wrote:
> mdiggory@latte.harvard.edu wrote:
> 
>> Phil Steitz wrote:
>>
>>> Since xbar = sum/n, the change has no impact on the which sums are 
>>> computed or squared. Instead of (sum/n)*(sum/n)*n your change just 
>>> computes sum**2/n.  The difference is that you are a) eliminating one 
>>> division by n and one multiplication by n (no doubt a good thing) and 
>>> b) replacing direct multiplication with pow(-,2). The second of these 
>>> used to be discouraged, but I doubt it makes any difference with 
>>> modern compilers.  I would suggest collapsing the denominators and 
>>> doing just one cast -- i.e., use
>>>
>>> (1) variance = sumsq - sum * (sum/(double) (n * (n - 1)))
>>>
>>> If
>>>
>>> (2) variance = sumsq - (sum * sum)/(double) (n * (n - 1))) or
>>>
>>> (3) variance = sumsq - Math.pow(sum,2)/(double) (n * (n - 1))) give
>>>
>>> better accuracy, use one of them; but I would favor (1) since it will 
>>> be able to handle larger positive sums.
>>>
>>> I would also recommend forcing getVariance() to return 0 if the 
>>> result is negative (which can happen in the right circumstances for 
>>> any of these formulas).
>>>
>>> Phil
>>
>>
>>
>> collapsing is definitely good, but I'm not sure about these equations, 
>> from my experience, approaching (2) would look something more like
>>
>> variance = (((double)n)*sumsq - (sum * sum)) / (double) (n * (n - 1));
>>
>> see (5) in http://mathworld.wolfram.com/k-Statistic.html
> 
> 
> That formula is the formula for the 2nd k-statistic, which is *not* the 
> same as the sample variance.  The standard formula for the sample 
> variance is presented in equation (3) here: 
> http://mathworld.wolfram.com/SampleVariance.html or in any elementary 
> statistics text. Formulas (1)-(3) above (and the current implementation) 
> are all equivalent to the standard defintion.
> 
> What you have above is not.  The relation between the variance and the 
> second k-statistic is presented in (9) on 
> http://mathworld.wolfram.com/k-Statistic.html

I just realized that I misread Wolfram's definitions. What he is 
defining as the 2nd k-statistic is the correct formula for the sample 
variance.  I am also missing some parenthesis above.  Your formula is 
correct.  Sorry.

Phil

> 
>>
>> As you've stated, this approach seems to have more than just one 
>> benifit. I'll also place in a test for negitive values and return 0.0 
>> if they are present.
>>
>> -Mark
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message