giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman" <>
Subject Re: Review Request: Refactor GraphMapper
Date Thu, 10 Jan 2013 19:02:18 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Jan. 10, 2013, 7:02 p.m.)

Review request for giraph.


This rebases the patch to trunk, and does a bit more refactoring on what was the BSP code
inside GraphMapper (as of this patch, it will now be encapsulated in GraphTaskManager) since
this class manages all the Giraph-centric work, and all the IPC objects, etc. the setup()
and execute() methods (as called from GraphMapper when running on Hadoop cluster) were long.
Now they are short(er), around a page or less each, and there are LOTS of private utility
methods below them in the file. My hope was simply to make it easier for a reader to 1. find
the important parts in GraphTaskManager/GraphMapper and 2. help them to follow which "events"
in the job run happen when and where, leaving them to dig into the utility methods should
they need more detail to tweak something. Not beautiful yet, but IMHO an improvement. More
dramatic refactors are on the table, or could wait for later.

My other goal here was to begin to split out Giraph BSP code from its reliance on Hadoop-specific
framework interfaces. This does not mean my goal is to remove o.a.hadoop classes from our
code, but to not directly call objects without an interface (like Mapper#Context) from within
Giraph code that ties us to a Hadoop cluster. This will make our internal use of many Hadoop
classes more "utility library of choice" rather than a direct dependency on a Hadoop MR cluster
to run.

I put a few more details on the JIRA page for GIRAPH-469. Next up, ZkManager and Mapper#Context
interfaces for Giraph to use. If anyone things I'm off to the wrong start on this, speak up
now and save me some work! ;)


Longer description available at the JIRA site. Short version: cleans up and refactors GraphMapper.
Begins a multistage process of setting up the underlying cluster framework (MRv1, MRv2 + YARN,
pure YARN, other cluster management platform...) to be decoupled from Giraph's BSP business
logic. After this patch, the main connection to Hadoop left in the processing code is the
Mapper#Context, and the various references it publishes into Giraph from Hadoop. Later JIRAs
will include more interfaces and a cleaner decoupling. Again, see the JIRA for more details.

Diffs (updated)

  giraph-core/src/main/java/org/apache/giraph/bsp/ 5c44030 
  giraph-core/src/main/java/org/apache/giraph/bsp/ 56a8288 
  giraph-core/src/main/java/org/apache/giraph/conf/ df7b80e 
  giraph-core/src/main/java/org/apache/giraph/graph/ 3292517 
  giraph-core/src/main/java/org/apache/giraph/graph/ PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/graph/ dd4dee4 
  giraph-core/src/main/java/org/apache/giraph/graph/ ee9e6a8 
  giraph-core/src/main/java/org/apache/giraph/graph/ PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/graph/ f5909d3 
  giraph-core/src/main/java/org/apache/giraph/master/ 4483385 
  giraph-core/src/main/java/org/apache/giraph/master/ cdb9e85 
  giraph-core/src/main/java/org/apache/giraph/metrics/ 6052fd8 
  giraph-core/src/main/java/org/apache/giraph/metrics/ 8fec14d

  giraph-core/src/main/java/org/apache/giraph/vertex/ d1fbe14 
  giraph-core/src/main/java/org/apache/giraph/worker/ f33fe58 
  giraph-core/src/main/java/org/apache/giraph/worker/ ec4780e 
  giraph-core/src/main/java/org/apache/giraph/zk/ PRE-CREATION




Eli Reisman

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message