commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andre Panisson (JIRA)" <>
Subject [jira] Commented: (MATH-224) Utility method to aggregate Statistics
Date Tue, 02 Dec 2008 10:54:44 GMT


Andre Panisson commented on MATH-224:

I didn't understood why the implementation of aggregation would not be obvious changing the
implementation setup. I think one premise for the implementation of a storeless second moment
and variance is the previous knowledge of the mean, n and previous variance. Without this
information, I think it is not possible make an update via single value increment. And the
formula I used to aggregate two variances uses this information, the mean, n and variance
of each statistic. I'm not a mathematician, but I suppose that in the higher moments, the
information needed to update via single value increment is sufficient to calculate also the
statistics aggregation.
The implementation I used now is using the nested first moments only because the required
information is available in the nested objects. But to support single value increment, the
mean, n and variance must be carried in some place, be it a nested object or not, and this
is sufficient information also to calculate aggregation.

> Utility method to aggregate Statistics
> --------------------------------------
>                 Key: MATH-224
>                 URL:
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Andre Panisson
>            Assignee: Phil Steitz
>            Priority: Minor
>             Fix For: 2.0
>         Attachments: commons_math.patch
> Below is the conversation related to this topic that was posted to the Commons Users
> -------------------------------------------------
> Hi,
> >
> > I'm writing a complex validation algorithm, that makes a K-Fold
> > cross-validation using a data set. The data set is partitioned into K
> >  subsamples, and of the K subsamples, a single subsample is retained
> > as the validation data for testing, and the remaining K − 1
> > subsamples are used as training data. The process is then repeated K
> > times, and at the end the K results are aggregated to a single
> > result. The problem is that all K results return Statistics objects
> > (org.apache.commons.math.stat.descriptive.SummaryStatistics), and I
> > need to make the aggregation of all K objects in a single Statistics.
> >  I think it is a common problem in the statistics field. There's
> > anyone who had already implemented an utility method to do it?
> There is no such feature currently in commons-math. The
> SummaryStatistics class wraps a bunch of specialized statistics classes
> (Sum, Mean, Max, SumOfSquares ...) which can be overriden by
> user-provided StorelessUnivariateStatistic implementations.
> So this feature should be added to the StorelessUnivariateStatistic
> interface and all its implementations, with a signature like this:
>  public void aggregate(StorelessUnivariateStatistic otherStatistic);
> The implementation of this method should only use the
> StorelessUnivariateStatistic methods, i.e. getResult() and getN(). This
> seems feasible for the statistics used by SummaryStatistics, but has not
> been done yet.
> One should be aware that SummaryStatistics does not enforce strong
> typing, so one could call aggregate on a Sum instance and provide it a
> Min instance, which would of course result in meaningless results.
> > Or maybe it would be interesting to request it as an Improvement to
> > the Commons Math developers, adding an "aggregator" to all Statistics
> >  implementations?
> If you want to request this improvement, please open a ticket for it
> using our JIRA tracking system:
> You'll have to register to be
> able to add your feature request. You can also provide a patch if you
> want to contribute it by yourself.
> Luc
> >
> > Thanks in advance,
> >
> > Andre Panisson

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message