commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "André Panisson" <panis...@gmail.com>
Subject Re: [math] Utility method to aggregate Statistics
Date Tue, 09 Sep 2008 19:14:01 GMT
Hi Luc,
Thank you for your attention, I'll open a ticket for it in the JIRA
tracking system and the details can be discussed there.
André

On Tue, Sep 9, 2008 at 8:53 PM, Luc Maisonobe <Luc.Maisonobe@free.fr> wrote:
> André Panisson a écrit :
>> Hello,
>
> Hi,
>
> First of all, please add the name of the component in square brackets at
> the beginning of the subject when you post a message to this list. I
> have done it here with [math], as your question concerns commons-math.
> This list is shared by all commons sub-projects, this policy helps
> people filter their mail.
>
>>
>> I'm writing a complex validation algorithm, that makes a K-Fold
>> cross-validation using a data set. The data set is partitioned into K
>>  subsamples, and of the K subsamples, a single subsample is retained
>> as the validation data for testing, and the remaining K − 1
>> subsamples are used as training data. The process is then repeated K
>> times, and at the end the K results are aggregated to a single
>> result. The problem is that all K results return Statistics objects
>> (org.apache.commons.math.stat.descriptive.SummaryStatistics), and I
>> need to make the aggregation of all K objects in a single Statistics.
>>  I think it is a common problem in the statistics field. There's
>> anyone who had already implemented an utility method to do it?
>
> There is no such feature currently in commons-math. The
> SummaryStatistics class wraps a bunch of specialized statistics classes
> (Sum, Mean, Max, SumOfSquares ...) which can be overriden by
> user-provided StorelessUnivariateStatistic implementations.
>
> So this feature should be added to the StorelessUnivariateStatistic
> interface and all its implementations, with a signature like this:
>  public void aggregate(StorelessUnivariateStatistic otherStatistic);
>
> The implementation of this method should only use the
> StorelessUnivariateStatistic methods, i.e. getResult() and getN(). This
> seems feasible for the statistics used by SummaryStatistics, but has not
> been done yet.
>
> One should be aware that SummaryStatistics does not enforce strong
> typing, so one could call aggregate on a Sum instance and provide it a
> Min instance, which would of course result in meaningless results.
>
>> Or maybe it would be interesting to request it as an Improvement to
>> the Commons Math developers, adding an "aggregator" to all Statistics
>>  implementations?
>
> If you want to request this improvement, please open a ticket for it
> using our JIRA tracking system:
> http://issues.apache.org/jira/browse/MATH. You'll have to register to be
> able to add your feature request. You can also provide a patch if you
> want to contribute it by yourself.
>
> Luc
>
>>
>> Thanks in advance,
>>
>> Andre Panisson
>>
>> ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: user-unsubscribe@commons.apache.org For
>> additional commands, e-mail: user-help@commons.apache.org
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>
Mime
View raw message