Return-Path: Delivered-To: apmail-commons-issues-archive@minotaur.apache.org Received: (qmail 21896 invoked from network); 20 Apr 2009 02:30:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Apr 2009 02:30:12 -0000 Received: (qmail 68891 invoked by uid 500); 20 Apr 2009 02:30:12 -0000 Delivered-To: apmail-commons-issues-archive@commons.apache.org Received: (qmail 68791 invoked by uid 500); 20 Apr 2009 02:30:11 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 68781 invoked by uid 99); 20 Apr 2009 02:30:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Apr 2009 02:30:11 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Apr 2009 02:30:09 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 743C2234C004 for ; Sun, 19 Apr 2009 19:29:47 -0700 (PDT) Message-ID: <1330918790.1240194587461.JavaMail.jira@brutus> Date: Sun, 19 Apr 2009 19:29:47 -0700 (PDT) From: "John Bollinger (JIRA)" To: issues@commons.apache.org Subject: [jira] Commented: (MATH-224) Utility method to aggregate Statistics In-Reply-To: <525122541.1220988464501.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MATH-224?page=3Dcom.atlassian.j= ira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D127006= 38#action_12700638 ]=20 John Bollinger commented on MATH-224: ------------------------------------- For clarity, "the approach described on the dev mailing list" is to accumul= ate aggregate stats simultaneously with accumulating per-partition stats. = The new class supports doing so with very little extra work. > Utility method to aggregate Statistics > -------------------------------------- > > Key: MATH-224 > URL: https://issues.apache.org/jira/browse/MATH-224 > Project: Commons Math > Issue Type: Improvement > Reporter: Andre Panisson > Assignee: Phil Steitz > Priority: Minor > Fix For: 2.0 > > Attachments: commons_math.patch, math_224.patch > > > Below is the conversation related to this topic that was posted to the Co= mmons Users group. > ------------------------------------------------- > Hi, > > > > I'm writing a complex validation algorithm, that makes a K-Fold > > cross-validation using a data set. The data set is partitioned into K > > subsamples, and of the K subsamples, a single subsample is retained > > as the validation data for testing, and the remaining K =E2=88=92 1 > > subsamples are used as training data. The process is then repeated K > > times, and at the end the K results are aggregated to a single > > result. The problem is that all K results return Statistics objects > > (org.apache.commons.math.stat.descriptive.SummaryStatistics), and I > > need to make the aggregation of all K objects in a single Statistics. > > I think it is a common problem in the statistics field. There's > > anyone who had already implemented an utility method to do it? > There is no such feature currently in commons-math. The > SummaryStatistics class wraps a bunch of specialized statistics classes > (Sum, Mean, Max, SumOfSquares ...) which can be overriden by > user-provided StorelessUnivariateStatistic implementations. > So this feature should be added to the StorelessUnivariateStatistic > interface and all its implementations, with a signature like this: > public void aggregate(StorelessUnivariateStatistic otherStatistic); > The implementation of this method should only use the > StorelessUnivariateStatistic methods, i.e. getResult() and getN(). This > seems feasible for the statistics used by SummaryStatistics, but has not > been done yet. > One should be aware that SummaryStatistics does not enforce strong > typing, so one could call aggregate on a Sum instance and provide it a > Min instance, which would of course result in meaningless results. > > Or maybe it would be interesting to request it as an Improvement to > > the Commons Math developers, adding an "aggregator" to all Statistics > > implementations? > If you want to request this improvement, please open a ticket for it > using our JIRA tracking system: > http://issues.apache.org/jira/browse/MATH. You'll have to register to be > able to add your feature request. You can also provide a patch if you > want to contribute it by yourself. > Luc > > > > Thanks in advance, > > > > Andre Panisson --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.