Return-Path: Delivered-To: apmail-jakarta-commons-dev-archive@apache.org Received: (qmail 59843 invoked from network); 3 Jun 2003 04:41:21 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 3 Jun 2003 04:41:21 -0000 Received: (qmail 13475 invoked by uid 97); 3 Jun 2003 04:43:43 -0000 Delivered-To: qmlist-jakarta-archive-commons-dev@nagoya.betaversion.org Received: (qmail 13468 invoked from network); 3 Jun 2003 04:43:42 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 3 Jun 2003 04:43:42 -0000 Received: (qmail 59628 invoked by uid 500); 3 Jun 2003 04:41:19 -0000 Mailing-List: contact commons-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Jakarta Commons Developers List" Reply-To: "Jakarta Commons Developers List" Delivered-To: mailing list commons-dev@jakarta.apache.org Received: (qmail 59617 invoked from network); 3 Jun 2003 04:41:18 -0000 Received: from latte.harvard.edu (140.247.210.252) by daedalus.apache.org with SMTP; 3 Jun 2003 04:41:18 -0000 Received: from localhost (localhost [127.0.0.1]) (uid 48) by latte.harvard.edu with local; Tue, 03 Jun 2003 00:41:29 -0400 Received: from 208.59.250.40 ( [208.59.250.40]) as user mdiggory@localhost by webmail.hmdc.harvard.edu with HTTP; Tue, 3 Jun 2003 00:41:29 -0400 Message-ID: <1054615289.3edc26f944cb4@webmail.hmdc.harvard.edu> Date: Tue, 3 Jun 2003 00:41:29 -0400 From: mdiggory@latte.harvard.edu To: Jakarta Commons Developers List Subject: Re: [math] UnivariateImpl - when sumsq ~ xbar*xbar*((double) n) Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 7bit User-Agent: Internet Messaging Program (IMP) 3.1 X-WebMail-Service: Harvard-MIT Data Center X-Originating-IP: 208.59.250.40 X-Mime-Autoconverted: from 8bit to 7bit by courier 0.40 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Phil Steitz wrote: > Since xbar = sum/n, the change has no impact on the which sums are > computed or squared. Instead of (sum/n)*(sum/n)*n your change just > computes sum**2/n. The difference is that you are a) eliminating one > division by n and one multiplication by n (no doubt a good thing) and b) > replacing direct multiplication with pow(-,2). The second of these used > to be discouraged, but I doubt it makes any difference with modern > compilers. I would suggest collapsing the denominators and doing just > one cast -- i.e., use > > (1) variance = sumsq - sum * (sum/(double) (n * (n - 1))) > > If > > (2) variance = sumsq - (sum * sum)/(double) (n * (n - 1))) or > > (3) variance = sumsq - Math.pow(sum,2)/(double) (n * (n - 1))) give > > better accuracy, use one of them; but I would favor (1) since it will be > able to handle larger positive sums. > > I would also recommend forcing getVariance() to return 0 if the result > is negative (which can happen in the right circumstances for any of > these formulas). > > Phil collapsing is definitely good, but I'm not sure about these equations, from my experience, approaching (2) would look something more like variance = (((double)n)*sumsq - (sum * sum)) / (double) (n * (n - 1)); see (5) in http://mathworld.wolfram.com/k-Statistic.html As you've stated, this approach seems to have more than just one benifit. I'll also place in a test for negitive values and return 0.0 if they are present. -Mark --------------------------------------------------------------------- To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: commons-dev-help@jakarta.apache.org