mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Possible contributions
Date Wed, 18 May 2011 10:38:54 GMT
I think it first has to finish embracing MapReduce! The code base already
uses 2.5 different versions of Hadoop. It would be better clean up the
modest clutter of approaches we already have before thinking about extending

Good news is there's a fair bit of time before any other particular
framework becomes widely used enough to merit thinking hard about.

And I do think we need to focus on cleanup now rather than later. For
example I will shortly suggest deprecating M/R jobs that use Hadoop 0.19
APIs in the name of moving forward.

On Wed, May 18, 2011 at 11:23 AM, Ted Dunning <> wrote:

> This is a theme that is going to raise itself over and over.
> I think that strategically, Mahout is going to have to embrace the
> MapReduce
> nextGen work so that we can have flexible computation models.  We already
> need this with all the large scale SVD work.  We could very much use it for
> the SGD stuff.  Now this gradient work could use it.
> New needs aren't going to stop.
> On Tue, May 17, 2011 at 10:17 PM, Hector Yee <> wrote:
> > Re: boosting scalability, I've implemented it on thousands of machines,
> but
> > not with mapreduce, rather with direct RPC calls. The gradient
> computation
> > tends to be iterative, so one way to do it is to have each iteration run
> > per
> > mapreduce.
> > Compute gradients in the mapper, gather them in the reducer, rinse and
> > repeat.
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message