 "Mark R. Diggory" <mdiggory@latte.harvard.edu> wrote:
> Al Chou wrote:
> > After implementing var.2 from the Stanford paper in UnivariateImpl and
> > scratching my head for some time over why the variance calculation failed
> its
> > JUnit test case, I realized there's a flaw in var.2 that I can't understand
> no
> > one talks about. To update the variance (called S in the paper), the
> formula
> > calculates
> >
> > z = y / i
> > S = S + (i1) * y * z
> >
> > where i is the number of data values (including the value just being added
> to
> > the collection). It doesn't really matter how y is defined, because you
> will
> > notice that
> >
> > S = S + (i1) * y * y / i
> > = S + (i1) * y**2 / i
> >
> > which means that S can never decrease in magnitude (for real data, which is
> > what we're talking about). But for the simple case of three data values
> {1, 2,
> > 2} in the JUnit test case, the variance decreases between the addition of
> the
> > second and third data values.
> >
> > Can anyone point out what I'm missing here?
>
> Al, I see what your saying, I wrote a little example case to implement
> the pseudo code they have in the paper:
>
> public class SmallTest {
>
> public static void main(String[] args) {
> double[] vals = new double[] { 1.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0 };
>
> double m = vals[0];
> double s = 0.0;
>
> System.out.println("m=" + m);
> System.out.println("s=" + s);
> System.out.println("");
>
> for (int i = 2; i <= vals.length; i++) {
>
> double y = vals[i1]  m;
> double z = y / i;
> m += z;
> s += (i  1) * y * z;
>
> System.out.println("y=" + y);
> System.out.println("z=" + z);
> System.out.println("m=" + m);
> System.out.println("s=" + s);
> System.out.println("");
> }
> }
> }
>
> s does seem to increase even thought the variance of the calculation
> should be going down.
>
> I want us to review this paper further and go back to the research of
>
> Hanson, R. J. 1975. Stably updating mean and standard
> deviation of data. Communications of the
> ACM 18:5758.
> Stanford, where he currently holds the Thomas Ford
> Chair in the Department of EngineeringEconomic
>
> Lets verify if theres a typo in the equation or something. Maybe these
> guys even misenterpreted his work.
Thanks for trying it out, Mark. Your code reads substantially the same as
mine, except that I was working inside of UnivariateImpl.
Google can't find the original paper online, but it does find Richard J.
Hanson's personal Web site, containing a bibliography of his publications and
two email addresses for him. Anyone have the courage to email him without
having first read the original paper? I wish I could derive the (or at least
an) updating variance formula myself; maybe I should try again.
Al
