Return-Path: Delivered-To: apmail-jakarta-commons-dev-archive@apache.org Received: (qmail 71908 invoked from network); 3 Jun 2003 05:00:07 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 3 Jun 2003 05:00:07 -0000 Received: (qmail 13647 invoked by uid 97); 3 Jun 2003 05:02:29 -0000 Delivered-To: qmlist-jakarta-archive-commons-dev@nagoya.betaversion.org Received: (qmail 13640 invoked from network); 3 Jun 2003 05:02:28 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 3 Jun 2003 05:02:28 -0000 Received: (qmail 71683 invoked by uid 500); 3 Jun 2003 05:00:02 -0000 Mailing-List: contact commons-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Jakarta Commons Developers List" Reply-To: "Jakarta Commons Developers List" Delivered-To: mailing list commons-dev@jakarta.apache.org Received: (qmail 71662 invoked from network); 3 Jun 2003 05:00:02 -0000 Received: from unknown (HELO hume.tsdinc.steitz.com) (209.249.229.10) by daedalus.apache.org with SMTP; 3 Jun 2003 05:00:02 -0000 Content-Class: urn:content-classes:message Received: from Lavoie.tsdinc.steitz.com ([209.249.229.4]) by hume.tsdinc.steitz.com with Microsoft SMTPSVC(5.0.2195.5329); Tue, 3 Jun 2003 01:00:11 -0400 Received: from steitz.com ([130.13.162.175]) by Lavoie.tsdinc.steitz.com with Microsoft SMTPSVC(5.0.2195.5329); Tue, 3 Jun 2003 01:00:11 -0400 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Message-ID: <3EDC2AD4.60503@steitz.com> Date: Mon, 02 Jun 2003 21:57:56 -0700 From: "Phil Steitz" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020408 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "Jakarta Commons Developers List" Subject: Re: [math] UnivariateImpl - when sumsq ~ xbar*xbar*((double) n) References: <1054615289.3edc26f944cb4@webmail.hmdc.harvard.edu> Content-Type: text/plain; format=flowed; charset="us-ascii" Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 03 Jun 2003 05:00:11.0401 (UTC) FILETIME=[0300AF90:01C3298D] X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N mdiggory@latte.harvard.edu wrote: > Phil Steitz wrote: > >>Since xbar = sum/n, the change has no impact on the which sums are >>computed or squared. Instead of (sum/n)*(sum/n)*n your change just >>computes sum**2/n. The difference is that you are a) eliminating one >>division by n and one multiplication by n (no doubt a good thing) and b) >>replacing direct multiplication with pow(-,2). The second of these used >>to be discouraged, but I doubt it makes any difference with modern >>compilers. I would suggest collapsing the denominators and doing just >>one cast -- i.e., use >> >>(1) variance = sumsq - sum * (sum/(double) (n * (n - 1))) >> >>If >> >>(2) variance = sumsq - (sum * sum)/(double) (n * (n - 1))) or >> >>(3) variance = sumsq - Math.pow(sum,2)/(double) (n * (n - 1))) give >> >>better accuracy, use one of them; but I would favor (1) since it will be >>able to handle larger positive sums. >> >>I would also recommend forcing getVariance() to return 0 if the result >>is negative (which can happen in the right circumstances for any of >>these formulas). >> >>Phil > > > collapsing is definitely good, but I'm not sure about these equations, from my > experience, approaching (2) would look something more like > > variance = (((double)n)*sumsq - (sum * sum)) / (double) (n * (n - 1)); > > see (5) in http://mathworld.wolfram.com/k-Statistic.html That formula is the formula for the 2nd k-statistic, which is *not* the same as the sample variance. The standard formula for the sample variance is presented in equation (3) here: http://mathworld.wolfram.com/SampleVariance.html or in any elementary statistics text. Formulas (1)-(3) above (and the current implementation) are all equivalent to the standard defintion. What you have above is not. The relation between the variance and the second k-statistic is presented in (9) on http://mathworld.wolfram.com/k-Statistic.html > > As you've stated, this approach seems to have more than just one benifit. I'll > also place in a test for negitive values and return 0.0 if they are present. > > -Mark > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org > For additional commands, e-mail: commons-dev-help@jakarta.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: commons-dev-help@jakarta.apache.org