Return-Path: Delivered-To: apmail-jakarta-commons-dev-archive@apache.org Received: (qmail 77620 invoked from network); 14 Jun 2003 05:57:14 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 14 Jun 2003 05:57:14 -0000 Received: (qmail 722 invoked by uid 97); 14 Jun 2003 05:59:41 -0000 Delivered-To: qmlist-jakarta-archive-commons-dev@nagoya.betaversion.org Received: (qmail 715 invoked from network); 14 Jun 2003 05:59:41 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 14 Jun 2003 05:59:41 -0000 Received: (qmail 77348 invoked by uid 500); 14 Jun 2003 05:57:11 -0000 Mailing-List: contact commons-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Jakarta Commons Developers List" Reply-To: "Jakarta Commons Developers List" Delivered-To: mailing list commons-dev@jakarta.apache.org Received: (qmail 77336 invoked from network); 14 Jun 2003 05:57:11 -0000 Received: from web41308.mail.yahoo.com (66.218.93.57) by daedalus.apache.org with SMTP; 14 Jun 2003 05:57:11 -0000 Message-ID: <20030614055723.16967.qmail@web41308.mail.yahoo.com> Received: from [68.164.16.54] by web41308.mail.yahoo.com via HTTP; Fri, 13 Jun 2003 22:57:23 PDT Date: Fri, 13 Jun 2003 22:57:23 -0700 (PDT) From: Al Chou Subject: Re: [math] more improvement to storage free mean, variance computation To: commons-dev@jakarta.apache.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N >Date: Wed, 04 Jun 2003 21:05:14 -0700 >From: Phil Steitz >Subject: [math] more improvement to storage free mean, variance computation > >Check out procedure sum.2 and var.2 in > >http://www.stanford.edu/~glynn/PDF/0208.pdf > >The first looks like Brent's suggestion for a corrected mean >computation, with no memory required. The additional computational cost >that I complained about is docuemented to be 3x the flops cost of the >direct computation, but the computation is claimed to be more stable. So >the question is: do we pay the flops cost to get the numerical >stability? The example in the paper is compelling; but it uses small >words (err, numbers I mean -- sorry, slipped in to my native Fortran for >a moment there ;-)). So how do we go about deciding whether the >stability in the mean computation is worth the increased computational >effort? I would prefer not to answer "let the user decide". To make >the decision harder, we should note that it is actually worse than 3x, >since in the no storage version, the user may request the mean only >rarely (if at all) and the 3x comparison is against computiing the mean >for each value added. > >The variance formula looks better than what we have now, still requiring >no memory. Should we implement this for the no storage case? After implementing var.2 from the Stanford paper in UnivariateImpl and scratching my head for some time over why the variance calculation failed its JUnit test case, I realized there's a flaw in var.2 that I can't understand no one talks about. To update the variance (called S in the paper), the formula calculates z = y / i S = S + (i?1) * y * z where i is the number of data values (including the value just being added to the collection). It doesn't really matter how y is defined, because you will notice that S = S + (i?1) * y * y / i = S + (i?1) * y**2 / i which means that S can never decrease in magnitude (for real data, which is what we're talking about). But for the simple case of three data values {1, 2, 2} in the JUnit test case, the variance decreases between the addition of the second and third data values. Can anyone point out what I'm missing here? Al ===== Albert Davidson Chou Get answers to Mac questions at http://www.Mac-Mgrs.org/ . __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: commons-dev-help@jakarta.apache.org