Phil Steitz wrote:
> mdiggory@latte.harvard.edu wrote:
>
>> Phil Steitz wrote:
>>
>>> Since xbar = sum/n, the change has no impact on the which sums are
>>> computed or squared. Instead of (sum/n)*(sum/n)*n your change just
>>> computes sum**2/n. The difference is that you are a) eliminating one
>>> division by n and one multiplication by n (no doubt a good thing) and
>>> b) replacing direct multiplication with pow(,2). The second of these
>>> used to be discouraged, but I doubt it makes any difference with
>>> modern compilers. I would suggest collapsing the denominators and
>>> doing just one cast  i.e., use
>>>
>>> (1) variance = sumsq  sum * (sum/(double) (n * (n  1)))
>>>
>>> If
>>>
>>> (2) variance = sumsq  (sum * sum)/(double) (n * (n  1))) or
>>>
>>> (3) variance = sumsq  Math.pow(sum,2)/(double) (n * (n  1))) give
>>>
>>> better accuracy, use one of them; but I would favor (1) since it will
>>> be able to handle larger positive sums.
>>>
>>> I would also recommend forcing getVariance() to return 0 if the
>>> result is negative (which can happen in the right circumstances for
>>> any of these formulas).
>>>
>>> Phil
>>
>>
>>
>> collapsing is definitely good, but I'm not sure about these equations,
>> from my experience, approaching (2) would look something more like
>>
>> variance = (((double)n)*sumsq  (sum * sum)) / (double) (n * (n  1));
>>
>> see (5) in http://mathworld.wolfram.com/kStatistic.html
>
>
> That formula is the formula for the 2nd kstatistic, which is *not* the
> same as the sample variance. The standard formula for the sample
> variance is presented in equation (3) here:
> http://mathworld.wolfram.com/SampleVariance.html or in any elementary
> statistics text. Formulas (1)(3) above (and the current implementation)
> are all equivalent to the standard defintion.
>
> What you have above is not. The relation between the variance and the
> second kstatistic is presented in (9) on
> http://mathworld.wolfram.com/kStatistic.html
I just realized that I misread Wolfram's definitions. What he is
defining as the 2nd kstatistic is the correct formula for the sample
variance. I am also missing some parenthesis above. Your formula is
correct. Sorry.
Phil
>
>>
>> As you've stated, this approach seems to have more than just one
>> benifit. I'll also place in a test for negitive values and return 0.0
>> if they are present.
>>
>> Mark
>>
>>
>> 
>> To unsubscribe, email: commonsdevunsubscribe@jakarta.apache.org
>> For additional commands, email: commonsdevhelp@jakarta.apache.org
>>
>
>
>
>
> 
> To unsubscribe, email: commonsdevunsubscribe@jakarta.apache.org
> For additional commands, email: commonsdevhelp@jakarta.apache.org
>

To unsubscribe, email: commonsdevunsubscribe@jakarta.apache.org
For additional commands, email: commonsdevhelp@jakarta.apache.org
