mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Re:Mahout compatibility with GraphLab
Date Thu, 01 Sep 2011 23:09:33 GMT
Well, protobufs is always an option as well.

There was a bit of a tiff between Yahoo developers and Cloudera over Avro a
few months ago.  There may be a bit of a residual bias left over from that
history.  Much of the motive force behind MR2.0 is now at Hortonworks so you
could imagine some momentum persisted across the event horizon surrounding
the exit from Yahoo.

(caveat, I have good friends who use protobufs and friends who use Avro.  I
like them both for different reasons)

On Thu, Sep 1, 2011 at 5:18 PM, Joseph Gonzalez <joseph.e.gonzalez@gmail.com
> wrote:

> With respect to MRv2 I think there are still some issues that need to be
> resolved.  For example I looked at the pipeline:
>
>     Hadoop Graph Creation --> Hadoop/Yarn GraphLab Launch --> Hadoop Post
> Processing
>
> This requires a common data format.  I constructed a prototype around AVRO
> but found that the C++ implementation lacks nested structures which I
> "needed" to cleanly encode the GraphLab data graph.  Also, surprisingly,
> MRv2 did not seem to have an AVRO interface (maybe this is fixed now).
>
> While these problems are not insurmountable they make it difficult to
> cleanly integrate with Hadoop, Yarn, and Avro while they are transitioning
> to MRv2 (especially when the C++bindings seem to be updated more slowly).
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message