giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pere Ferrera <ferrerabert...@gmail.com>
Subject serialization / deserialization improvement suggestion
Date Wed, 19 Sep 2012 18:19:41 GMT
Hi to all,

I have been taking a look to Giraph's source code. I have noticed the heavy
usage of Writables in it and, even though I don't know many of the details
of the project, I think it would be a good idea to at least consider the
usage of Pangool instead of the Java Hadoop API.

Pangool (http://pangool.net) is a low-level Java API on top of Hadoop that
aims to make several things easier, one of them is dealing with compound
types. Most of the others don't apply to Giraph since you are doing
Map-Only jobs.

The most interesting part of it for Giraph is that you would be able to
have a Vertexs with Java classes (Integer, Float, ... or arbitrary
serializable Objects) without needing to worry them being Writable. This
would reduce some of the code and complexity of the project and it would
allow for a more expressive, decoupled from Hadoop code where user
functions (business logic) operate directly on Java types rather than on
Hadoop types.

Pangool has been designed for performance so it should perform in the same
order than plain Hadoop (we did a benchmark to show that). Pangool uses
Avro for persisting data. It is being used in production in some of our
consulting projects (datasalt.com) successfully so we contribute actively
to it.

So, if this could be interesting at all I will be glad to submit a proposal
in a patch and contribute. It will be a win-win situation where Pangool
will benefit a lot from being actively used by a serious open-source
project like Giraph. Of course, many details will need to be discussed.
Take this as a preliminar suggestion just to see how it sounds. Feel free
to ask any questions or concerns you may have.

Thanks,

Pere.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message