Al Chou wrote:
>  Phil Steitz <phil@steitz.com> wrote:
>
>>Al Chou wrote:
>>
>>>>Date: Wed, 04 Jun 2003 21:05:14 0700
>>>>From: Phil Steitz <phil@steitz.com>
>>>>Subject: [math] more improvement to storage free mean, variance computation
>>>>
>>>>Check out procedure sum.2 and var.2 in
>>>>
>>>>http://www.stanford.edu/~glynn/PDF/0208.pdf
>>>>
>>>>The first looks like Brent's suggestion for a corrected mean
>>>>computation, with no memory required. The additional computational cost
>>>>that I complained about is docuemented to be 3x the flops cost of the
>>>>direct computation, but the computation is claimed to be more stable. So
>>>>the question is: do we pay the flops cost to get the numerical
>>>>stability? The example in the paper is compelling; but it uses small
>>>>words (err, numbers I mean  sorry, slipped in to my native Fortran for
>>>>a moment there ;)). So how do we go about deciding whether the
>>>>stability in the mean computation is worth the increased computational
>>>>effort? I would prefer not to answer "let the user decide". To make
>>>>the decision harder, we should note that it is actually worse than 3x,
>>>>since in the no storage version, the user may request the mean only
>>>>rarely (if at all) and the 3x comparison is against computiing the mean
>>>>for each value added.
>>>>
>>>>The variance formula looks better than what we have now, still requiring
>>>>no memory. Should we implement this for the no storage case?
>>>
>>>
>>>After implementing var.2 from the Stanford paper in UnivariateImpl and
>>>scratching my head for some time over why the variance calculation failed
>>
>>its
>>
>>>JUnit test case, I realized there's a flaw in var.2 that I can't understand
>>
>>no
>>
>>>one talks about. To update the variance (called S in the paper), the
>>
>>formula
>>
>>>calculates
>>>
>>>z = y / i
>>>S = S + (i1) * y * z
>>>
>>>where i is the number of data values (including the value just being added
>>
>>to
>>
>>>the collection). It doesn't really matter how y is defined, because you
>>
>>will
>>
>>>notice that
>>>
>>>S = S + (i1) * y * y / i
>>> = S + (i1) * y**2 / i
>>>
>>>which means that S can never decrease in magnitude (for real data, which is
>>>what we're talking about). But for the simple case of three data values
>>
>>{1, 2,
>>
>>>2} in the JUnit test case, the variance decreases between the addition of
>>
>>the
>>
>>>second and third data values.
>>>
>>>Can anyone point out what I'm missing here?
>>>
>>>
>>
>>I think that is OK, since if you look at the definition of S earlier in
>>the paper, S is not the variance, it is the sum of the squared
>>deviations from the mean. This should be always increasing.
>
>
> Where is that definition? I'm looking at equations 3 and 4, which define
> S_{1,q} (in LaTeX notation), and the return statement in algorithm Procedure
> var.2, which says S_{1,q} = S.
Equation 3 defines S to be the sum of squared differences between L_i
and Lbar, which are defined on p. 209 to be the observed values and
their mean.
>
> Anyway, I think the resolution is contained in messages to follow shortly.
>
>
> Al
>
> =====
> Albert Davidson Chou
>
> Get answers to Mac questions at http://www.MacMgrs.org/ .
>
> __________________________________
> Do you Yahoo!?
> SBC Yahoo! DSL  Now only $29.95 per month!
> http://sbc.yahoo.com
>
> 
> To unsubscribe, email: commonsdevunsubscribe@jakarta.apache.org
> For additional commands, email: commonsdevhelp@jakarta.apache.org
>

To unsubscribe, email: commonsdevunsubscribe@jakarta.apache.org
For additional commands, email: commonsdevhelp@jakarta.apache.org
