Albretch Mueller wrote:
>> Anyone else have ideas on how best to do this?
>>
> ~
> IMHO this is so simple that I would doubt there is a "best" way to do it ;).
>
What I meant was the best way to define the API. There are also
numerical considerations.
In the commons math descriptive statistics package, we have two kinds of
univariate statistics  those that "store" associated data values, and
those that do not. The DescriptiveStatistics class aggregates
statistics based on a "stored" dataset (so should not be used for very
large datasets, unless "rolling" statistics are desired). The class to
look at to design updating functions like you are suggesting is
SummaryStatistics, which does not store data values, but updates
internal sums, etc. The updating formulas for moment statistics (mean,
variance, skewness) are not naive, so the util functions that you are
proposing would have to be coded with care so that results would match;
or at least be documented appropriately.
> you are the ones that know well/own the underlying data structures and logic
> ~
> If you take a close look at:
> ~
> http://en.wikipedia.org/wiki/Mean
> http://en.wikipedia.org/wiki/Standard_deviation
> http://en.wikipedia.org/wiki/Skewness
> ~
> You will see that, since these are essentially summations and their
> exponentiations, you could derive the relationship of the new stat
> values based on:
> ~
> 1) the old stat ones,
> 2) the count of how many have been computed so far, and
> 3) the new entry
> ~
> I must run out of my place right now. If no one has done so when I
> come back (within 2 hours) I will digest to
> you/user@commons.apache.org the math, propose some pseudo code along
> with some basic java code on how to do that.
>
Thanks!
Its probably best to take the discussion to the dev list.
Phil

To unsubscribe, email: userunsubscribe@commons.apache.org
For additional commands, email: userhelp@commons.apache.org
