mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhruv Kumar <dku...@ecs.umass.edu>
Subject Re: Possible contributions
Date Wed, 18 May 2011 15:01:00 GMT
On Wed, May 18, 2011 at 6:38 AM, Sean Owen <srowen@gmail.com> wrote:

> I think it first has to finish embracing MapReduce! The code base already
> uses 2.5 different versions of Hadoop. It would be better clean up the
> modest clutter of approaches we already have before thinking about
> extending
> it.
>


For the GSoC project which version of Hadoop's API should I follow?



> Good news is there's a fair bit of time before any other particular
> framework becomes widely used enough to merit thinking hard about.
>
> And I do think we need to focus on cleanup now rather than later. For
> example I will shortly suggest deprecating M/R jobs that use Hadoop 0.19
> APIs in the name of moving forward.
>
> On Wed, May 18, 2011 at 11:23 AM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > This is a theme that is going to raise itself over and over.
> >
> > I think that strategically, Mahout is going to have to embrace the
> > MapReduce
> > nextGen work so that we can have flexible computation models.  We already
> > need this with all the large scale SVD work.  We could very much use it
> for
> > the SGD stuff.  Now this gradient work could use it.
> >
> > New needs aren't going to stop.
> >
> > On Tue, May 17, 2011 at 10:17 PM, Hector Yee <hector.yee@gmail.com>
> wrote:
> >
> > > Re: boosting scalability, I've implemented it on thousands of machines,
> > but
> > > not with mapreduce, rather with direct RPC calls. The gradient
> > computation
> > > tends to be iterative, so one way to do it is to have each iteration
> run
> > > per
> > > mapreduce.
> > > Compute gradients in the mapper, gather them in the reducer, rinse and
> > > repeat.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message