> In fact the higher the moment you calculate (variance, skew, kurtosis)
> on the below set of numbers, the greater the loss of precision, this is
> because for values less than one
>
> sumquad << sumcube << sumsq << sum
>
> Mark
>
> Mark R. Diggory wrote:
And when the values are greater than one, we run the additional risk of
overflow. Also, because sumsq and n*xbar^2 are relatively large and
relatively equal, subtracting the two, as done in computing variance,
results in loss of precision as well.
One possible way to limit these problems it by using central moments in lieu
of raw moments. Since central moments are expected values, they tend to
converge to a finite value as the sample size increases. They only time
they wouldn't converge is when the data is drawn from a distribution where
those higher moments don't exist.
There are easy formulas for skewness and kurtosis based on the central
moments which could be used for the stored, univariate implementations:
http://mathworld.wolfram.com/Skewness.html
http://mathworld.wolfram.com/Kurtosis.html
As for the rolling implementations, there might be some more research
involved before using this method because of their memoryless property. But
for starters, the sum and sumsq can easily be replaced with there central
moment counterparts, mean and variance. There are formulas that update those
metrics when a new value is added. Weisberg's "Applied Linear Regression"
outlines two such updating formulas for mean and sum of squares which are
numerically superior to direct computation and the raw moment methods.
mean[0] = 0
mean[m + 1] = mean[m] + ((1 / (m + 1)) * (x[m + 1]  mean[m]))
ss[0] = 0
ss[m + 1] = ss[m] + ((m / (m + 1)) * (x[m + 1]  mean[m])^2)
where mean[m] is the mean for the first m values
x[m] is the mth value
and ss[m] is the sum of squares for the first m values
The sum of squares formula could then be used to derive a similar formula
for variance:
var[0] = 0.0
var[m + 1] = ((m  1) / m) * var[m] + ((1 / (m + 1)) * (x[m + 1] 
mean[m])^2)
where var[m] is the sample variance for the first m values
I'd be surprised if similar updating formulas didn't exist for the third and
forth central moments. I'll look into it further.
Brent Worden
http://www.brent.worden.org

To unsubscribe, email: commonsdevunsubscribe@jakarta.apache.org
For additional commands, email: commonsdevhelp@jakarta.apache.org
