commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Additions to support Large Linear Regression problems
Date Fri, 24 Jun 2011 19:10:29 GMT
Mahout has this.

We have an LSMR implementation that can accept a generic linear operator.
 You can implement this linear operator as an out of core multiplication or
as a cluster operation.

You don't say how large you want the system to be or whether you have sparse
data.  That might change the answer.


On Fri, Jun 24, 2011 at 11:44 AM, Greg Sterijevski

> Hello All,
> I have been a user of the math commons jar for a little over a year and am
> very impressed with it. I was wondering whether anyone is actively working
> on implementing functionality to do regressions on very very large data
> sets. The current implementation of the OLS routine is an in-core QR
> decomposition with substitution. While the solutions are typically
> accurate,
> the in-core nature limits the usefulness of these objects.
> Looking through the code, most of the implementation of an InputStream
> based
> regression routine would respect the contract implicit in the interface
> MultipleLinearRegression. However, large regression problems are important
> enough that there should be a way to:
> 1. Wrap a potentially large data source, perhaps as an InputStream of some
> sort.
> 2. Have a separate contract with methods like clear() ( to clear whatever
> intermediate calculations are stored), and regress() which generates
> immutable results that are not affected by further updates of the data.
> I would appreciate any thoughts or comments, as well suggestions about
> functionality already in math commons which might address some points I
> raised.
> Thank you,
> -Greg

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message