mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: GraphLab
Date Tue, 08 Mar 2011 23:26:34 GMT
Looks interesting -- it looks like a specialization for iterative
algorithms of a certain kind, a kind which describes a lot of
algorithms. Is this distributed? It looked more like it's intended for
high-performance machines. I guess it's also different being C++-based
and not Hadoop-based.

Hadoop is, in the end, a tool that was never conceived for general
distributed computation. But among frameworks it's (relatively) well
understood and available. It seems like Mahout has taken on the
mission of delivering something that works on the framework that's out
there now, which is a practical rather than theoretically-motivated
goal. (I think it's a good goal too.) I see that as a difference from
many research-oriented projects.

Beyond that it is the same sort of thing and that's good.

The thing I "worry" most that is being duplicated is actually Pig. It
at least gives something more like "primitives" for basic
information-shuffling operations on Hadoop like the sorts of pivots
and joins and filters that go into your standard implementation of an
ML algorithm. I bet we'd find we'd be better off bringing in some
stuff from Pig rather than reinvent the join a few times over.

But first things first... would really be good to focus on revamping
and bringing together what we have already to pull together
commonality and such before thinking what we can improve about those

On Tue, Mar 8, 2011 at 11:07 PM, Shannon Quinn <> wrote:
> Being the newbie on the block, forgive me if I'm rehashing old news: has
> anything seen/heard of GraphLab before?
> It's written by someone who has an office in the same exact building as I
> do, just one floor up, so I'll certainly be talking to him soon. But if
> there is someone here who is familiar with this work, can you elaborate on
> the differences between it and Mahout? He seems to have somewhat tweaked the
> standard map/reduce paradigm into something that offers more crosstalk
> flexibility between nodes at runtime (at the cost of significant
> configurational overhead, most likely), but beyond that it seems strikingly
> similar to the functionality Mahout provides.
> Anyway, was pointed to this by someone in my department while I was running
> my coalescing thesis ideas by him.
> Shannon

View raw message