commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Sterijevski <gsterijev...@gmail.com>
Subject Re: Additions to support Large Linear Regression problems
Date Sat, 25 Jun 2011 04:14:33 GMT
Hi Ted,

I will look at the Mahout library. I was not aware of this. I will see if
this is amenable to my problems.

A large problem would be one where it does not make sense to pull all the
data into core, whether its 10Gb or 100Tb. While some of these design
matrices might be sparse, there is no reason to expect it.

-Greg




On Fri, Jun 24, 2011 at 2:10 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Mahout has this.
>
> We have an LSMR implementation that can accept a generic linear operator.
>  You can implement this linear operator as an out of core multiplication or
> as a cluster operation.
>
> You don't say how large you want the system to be or whether you have
> sparse
> data.  That might change the answer.
>
> See http://www.stanford.edu/group/SOL/software/lsmr.html
>
> On Fri, Jun 24, 2011 at 11:44 AM, Greg Sterijevski
> <gsterijevski@gmail.com>wrote:
>
> > Hello All,
> >
> > I have been a user of the math commons jar for a little over a year and
> am
> > very impressed with it. I was wondering whether anyone is actively
> working
> > on implementing functionality to do regressions on very very large data
> > sets. The current implementation of the OLS routine is an in-core QR
> > decomposition with substitution. While the solutions are typically
> > accurate,
> > the in-core nature limits the usefulness of these objects.
> >
> > Looking through the code, most of the implementation of an InputStream
> > based
> > regression routine would respect the contract implicit in the interface
> > MultipleLinearRegression. However, large regression problems are
> important
> > enough that there should be a way to:
> >
> > 1. Wrap a potentially large data source, perhaps as an InputStream of
> some
> > sort.
> > 2. Have a separate contract with methods like clear() ( to clear whatever
> > intermediate calculations are stored), and regress() which generates
> > immutable results that are not affected by further updates of the data.
> >
> > I would appreciate any thoughts or comments, as well suggestions about
> > functionality already in math commons which might address some points I
> > raised.
> >
> > Thank you,
> >
> > -Greg
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message