commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <Luc.Maison...@free.fr>
Subject Re: [math] Utility method to aggregate Statistics
Date Tue, 09 Sep 2008 18:53:46 GMT
André Panisson a écrit :
> Hello,

Hi,

First of all, please add the name of the component in square brackets at
the beginning of the subject when you post a message to this list. I
have done it here with [math], as your question concerns commons-math.
This list is shared by all commons sub-projects, this policy helps
people filter their mail.

> 
> I'm writing a complex validation algorithm, that makes a K-Fold 
> cross-validation using a data set. The data set is partitioned into K
>  subsamples, and of the K subsamples, a single subsample is retained 
> as the validation data for testing, and the remaining K − 1 
> subsamples are used as training data. The process is then repeated K 
> times, and at the end the K results are aggregated to a single 
> result. The problem is that all K results return Statistics objects 
> (org.apache.commons.math.stat.descriptive.SummaryStatistics), and I 
> need to make the aggregation of all K objects in a single Statistics.
>  I think it is a common problem in the statistics field. There's 
> anyone who had already implemented an utility method to do it?

There is no such feature currently in commons-math. The
SummaryStatistics class wraps a bunch of specialized statistics classes
(Sum, Mean, Max, SumOfSquares ...) which can be overriden by
user-provided StorelessUnivariateStatistic implementations.

So this feature should be added to the StorelessUnivariateStatistic
interface and all its implementations, with a signature like this:
 public void aggregate(StorelessUnivariateStatistic otherStatistic);

The implementation of this method should only use the
StorelessUnivariateStatistic methods, i.e. getResult() and getN(). This
seems feasible for the statistics used by SummaryStatistics, but has not
been done yet.

One should be aware that SummaryStatistics does not enforce strong
typing, so one could call aggregate on a Sum instance and provide it a
Min instance, which would of course result in meaningless results.

> Or maybe it would be interesting to request it as an Improvement to 
> the Commons Math developers, adding an "aggregator" to all Statistics
>  implementations?

If you want to request this improvement, please open a ticket for it
using our JIRA tracking system:
http://issues.apache.org/jira/browse/MATH. You'll have to register to be
able to add your feature request. You can also provide a patch if you
want to contribute it by yourself.

Luc

> 
> Thanks in advance,
> 
> Andre Panisson
> 
> ---------------------------------------------------------------------
>  To unsubscribe, e-mail: user-unsubscribe@commons.apache.org For 
> additional commands, e-mail: user-help@commons.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message