Return-Path: X-Original-To: apmail-commons-issues-archive@minotaur.apache.org Delivered-To: apmail-commons-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B38E19B5A for ; Mon, 23 Jul 2012 20:43:36 +0000 (UTC) Received: (qmail 56595 invoked by uid 500); 23 Jul 2012 20:43:35 -0000 Delivered-To: apmail-commons-issues-archive@commons.apache.org Received: (qmail 56459 invoked by uid 500); 23 Jul 2012 20:43:35 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 55985 invoked by uid 99); 23 Jul 2012 20:43:34 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Jul 2012 20:43:34 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id D5453142856 for ; Mon, 23 Jul 2012 20:43:34 +0000 (UTC) Date: Mon, 23 Jul 2012 20:43:34 +0000 (UTC) From: "Thomas Neidhart (JIRA)" To: issues@commons.apache.org Message-ID: <1763774670.92825.1343076214880.JavaMail.jiratomcat@issues-vm> In-Reply-To: <68959041.4484.1309409848821.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420949#comment-13420949 ] Thomas Neidhart commented on MATH-607: -------------------------------------- The patch has already been applied for 3.0. Is there still something to do? > Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. > ----------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: MATH-607 > URL: https://issues.apache.org/jira/browse/MATH-607 > Project: Commons Math > Issue Type: New Feature > Affects Versions: 3.0 > Environment: Java > Reporter: greg sterijevski > Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma > Fix For: 3.1 > > Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces > > Original Estimate: 840h > Remaining Estimate: 840h > > The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an "UpdatingLinearRegression" interface which defines basic functionality all such techniques must fulfill. > Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface "RegressionResults". > Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place. > Thanks, > -Greg -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira