commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Steitz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
Date Thu, 21 Jul 2011 19:58:57 GMT

    [ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069167#comment-13069167
] 

Phil Steitz commented on MATH-607:
----------------------------------

Checkstyle fixes committed in r1149281, r1149335.  Many thanks, Greg.  Still remaining: 1)
fix exceptions.  I will do this if I get no objections to RegressionModelSpecificationException
proposed on the mailing list.  2) findbugs config to ignore unsafe access check in RegressionResults.
 3) RegressionResults has no getter for rank  4) I am still not getting why we need globalFitInfo
in RegressionResults.  Why not just name the fields instead of maintaining an array and names
for indexes into it?

> Current Multiple Regression Object does calculations with all data incore. There are
non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma
>             Fix For: 3.0
>
>         Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1,
updating_reg_cut2, updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete data set.
This necessitates the loading incore of the complete dataset. For large datasets, or large
datasets and a requirement to do datamining or stepwise regression this is not practical.
There are techniques which form the normal equations on the fly, as well as ones which form
the QR decomposition on an update basis. I am proposing, first, the specification of an "UpdatingLinearRegression"
interface which defines basic functionality all such techniques must fulfill. 
> Related to this 'updating' regression, the results of running a regression on some subset
of the data should be encapsulated in an immutable object. This is to ensure that subsequent
additions of observations do not corrupt or render inconsistent parameter estimates. I am
calling this interface "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the concrete implementation
of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message