Return-Path: Delivered-To: apmail-jakarta-commons-dev-archive@apache.org Received: (qmail 52892 invoked from network); 26 May 2003 03:48:18 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 26 May 2003 03:48:18 -0000 Received: (qmail 20447 invoked by uid 97); 26 May 2003 03:50:37 -0000 Delivered-To: qmlist-jakarta-archive-commons-dev@nagoya.betaversion.org Received: (qmail 20440 invoked from network); 26 May 2003 03:50:37 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 26 May 2003 03:50:37 -0000 Received: (qmail 52724 invoked by uid 500); 26 May 2003 03:48:17 -0000 Mailing-List: contact commons-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Jakarta Commons Developers List" Reply-To: "Jakarta Commons Developers List" Delivered-To: mailing list commons-dev@jakarta.apache.org Received: (qmail 52713 invoked from network); 26 May 2003 03:48:16 -0000 Received: from h000.c000.snv.cp.net (HELO c000.snv.cp.net) (209.228.32.64) by daedalus.apache.org with SMTP; 26 May 2003 03:48:16 -0000 Received: (cpmta 10446 invoked from network); 25 May 2003 20:48:26 -0700 Received: from 24.118.174.155 (HELO odin) by smtp.worden.org (209.228.32.64) with SMTP; 25 May 2003 20:48:26 -0700 X-Sent: 26 May 2003 03:48:26 GMT From: "Brent Worden" To: "Jakarta Commons Developers List" Subject: RE: [math] Greetings from a newcomer (but not to numerics) Date: Sun, 25 May 2003 23:00:07 -0500 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) In-Reply-To: <3ED0ED4B.6050707@latte.harvard.edu> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N > In fact the higher the moment you calculate (variance, skew, kurtosis) > on the below set of numbers, the greater the loss of precision, this is > because for values less than one > > sumquad << sumcube << sumsq << sum > > -Mark > > Mark R. Diggory wrote: And when the values are greater than one, we run the additional risk of overflow. Also, because sumsq and n*xbar^2 are relatively large and relatively equal, subtracting the two, as done in computing variance, results in loss of precision as well. One possible way to limit these problems it by using central moments in lieu of raw moments. Since central moments are expected values, they tend to converge to a finite value as the sample size increases. They only time they wouldn't converge is when the data is drawn from a distribution where those higher moments don't exist. There are easy formulas for skewness and kurtosis based on the central moments which could be used for the stored, univariate implementations: http://mathworld.wolfram.com/Skewness.html http://mathworld.wolfram.com/Kurtosis.html As for the rolling implementations, there might be some more research involved before using this method because of their memoryless property. But for starters, the sum and sumsq can easily be replaced with there central moment counterparts, mean and variance. There are formulas that update those metrics when a new value is added. Weisberg's "Applied Linear Regression" outlines two such updating formulas for mean and sum of squares which are numerically superior to direct computation and the raw moment methods. mean[0] = 0 mean[m + 1] = mean[m] + ((1 / (m + 1)) * (x[m + 1] - mean[m])) ss[0] = 0 ss[m + 1] = ss[m] + ((m / (m + 1)) * (x[m + 1] - mean[m])^2) where mean[m] is the mean for the first m values x[m] is the m-th value and ss[m] is the sum of squares for the first m values The sum of squares formula could then be used to derive a similar formula for variance: var[0] = 0.0 var[m + 1] = ((m - 1) / m) * var[m] + ((1 / (m + 1)) * (x[m + 1] - mean[m])^2) where var[m] is the sample variance for the first m values I'd be surprised if similar updating formulas didn't exist for the third and forth central moments. I'll look into it further. Brent Worden http://www.brent.worden.org --------------------------------------------------------------------- To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: commons-dev-help@jakarta.apache.org