commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Steitz" <>
Subject Re: [math] more improvement to storage free mean, variance computation
Date Sat, 14 Jun 2003 20:34:37 GMT
Al Chou wrote:
>>Date: Wed, 04 Jun 2003 21:05:14 -0700
>>From: Phil Steitz <>
>>Subject: [math] more improvement to storage free mean, variance computation
>>Check out procedure sum.2 and var.2 in
>>The first looks like Brent's suggestion for a corrected mean 
>>computation, with no memory required.  The additional computational cost 
>>that I complained about is docuemented to be 3x the flops cost of the 
>>direct computation, but the computation is claimed to be more stable. So 
>>the question is: do we pay the flops cost to get the numerical 
>>stability?  The example in the paper is compelling; but it uses small 
>>words (err, numbers I mean -- sorry, slipped in to my native Fortran for 
>>a moment there ;-)).  So how do we go about deciding whether the 
>>stability in the mean computation is worth the increased computational 
>>effort?  I would prefer not to answer "let the user decide".  To make 
>>the decision harder, we should note that it is actually worse than 3x, 
>>since in the no storage version, the user may request the mean only 
>>rarely (if at all) and the 3x comparison is against computiing the mean 
>>for each value added.
>>The variance formula looks better than what we have now, still requiring 
>>no memory.  Should we implement this for the no storage case?
> After implementing var.2 from the Stanford paper in UnivariateImpl and
> scratching my head for some time over why the variance calculation failed its
> JUnit test case, I realized there's a flaw in var.2 that I can't understand no
> one talks about.  To update the variance (called S in the paper), the formula
> calculates
> z = y / i
> S = S + (i?1) * y * z
> where i is the number of data values (including the value just being added to
> the collection).  It doesn't really matter how y is defined, because you will
> notice that
> S = S + (i?1) * y * y / i
>   = S + (i?1) * y**2 / i
> which means that S can never decrease in magnitude (for real data, which is
> what we're talking about).  But for the simple case of three data values {1, 2,
> 2} in the JUnit test case, the variance decreases between the addition of the
> second and third data values.
> Can anyone point out what I'm missing here?
I think that is OK, since if you look at the definition of S earlier in 
the paper, S is not the variance, it is the sum of the squared 
deviations from the mean.  This should be always increasing.

> Al
> =====
> Albert Davidson Chou
>     Get answers to Mac questions at .
> __________________________________
> Do you Yahoo!?
> SBC Yahoo! DSL - Now only $29.95 per month!
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message