giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman" <initialcont...@gmail.com>
Subject Re: Review Request: GIRAPH-13: Port Giraph to YARN
Date Sat, 16 Mar 2013 07:22:53 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9811/
-----------------------------------------------------------

(Updated March 16, 2013, 7:22 a.m.)


Review request for giraph.


Changes
-------

Hey guys. fixes a small bug involving the integration test's MiniYARNCluster borrowing same-named
test dirs from running test instances of InternalVertexRunner including port collisions on
22181 and 22182. All fixed. Ran a bunch of times, seems to be working well now.

The command line I gave in the last RB post here is a bit bonkers. Here's an example I used
today to run connected components on a 3 million V synthetic graph. It took about 52 seconds
on average:

{code}
# from your giraph source tree, assuming Hadoop-2.0.3-alpha is up and you can run wordcount
on it.
# Your Hadoop cluster can be a local singlenode or real.
mvn -Phadoop_yarn clean package

cp giraph-examples/target/giraph-examples*.jar ~/hadoop/share/hadoop/giraph/

cd ~/hadoop

# this will instantiate 5 YARN container processes: 1 Application Master to manage the job.
1 Master node, and 3 Worker nodes.
# As always, you pick the # of workers with -w option, but here we always keep in mind there
will be 2 more processes: the master and the app master.
# Remember this if the cluster fails due to lack of memory -- you need to allocate just enough
workers that you have left for a master with the same amount of 
# heap in -yh option, AND an app master running on a gig of memory too.
hadoop --config etc/hadoop jar share/hadoop/giraph/giraph-examples-0.2-SNAPSHOT-for-hadoop-2.0.3-alpha-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner org.apache.giraph.examples.ConnectedComponentsVertex -w 3 -yj
giraph-examples-0.2-SNAPSHOT-for-hadoop-2.0.3-alpha-jar-with-dependencies.jar -yh 1024 -vif
org.apache.giraph.io.formats.IntIntNullIntTextInputFormat -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat
-vip graph3mil -op demoOutput8
{code}


Description
-------

Port Giraph to "pure YARN" clusters, using Hadoop MapReduce classes in our code (IO formats
etc.) but running the cluster job without any active participation by a running MapReduce
framework. This means doing some things ourselves that Hadoop used to do for us.

I am putting this up for review to aid some non-Giraphers in having a peek at the YARN component.
There is a bit of latency in the job launch that I am still diagnosing. I am also still finishing
up an integration test to verify the YARN components can run a no-op Giraph job successfully.
All BSP code is covered by our MRv1 tests, which are sufficient since once Giraph is running,
it does not know or care if its running on YARN. The grand total is TWO files with FOUR actual
munges, total for the entire patch. All the rest is conditionally compiled and/or manipulated
through conf settings without ever calling into YARN-specific code from inside Giraph. This
will allow us to wait on ripping apart our IO formats or other MRv1 baked-in dependencies
before we're ready to abandon MR. This also sets up a paradigm by which it will be easy to
port us to other cluster frameworks (Mesos, etc.)

I will ping Giraph folks when this is really ready for review (hopefully next day or so) but
feel free to drop me a line now if you see something you are curious about or just plain don't
like. The sooner I fix it, the sooner this gets committed, so please speak up if you do.

My goal is to make this not only our port of YARN, but another (there aren't many) good and
well-commented example of how to run "real applications" like Giraph on YARN clusters. So
I'm hoping its clear and easy to follow on that level as well. Happy to hear feedback on that
angle as well!

Thanks! Will post a wiki page explaining a bit more about this when its all finished. This
version is still depending on Hadoop-2.0.3-alpha, but I will attempt to back port to 2.0.2
before I'm done, and a future JIRA should bring us to 2.0.0 or higher (and trunk of course.)
 


Diffs (updated)
-----

  checkstyle.xml 3d8a6d4 
  giraph-core/pom.xml 3580d0c 
  giraph-core/src/main/java/org/apache/giraph/GiraphRunner.java 5bd5686 
  giraph-core/src/main/java/org/apache/giraph/bsp/BspInputFormat.java bce84b1 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 6886d58 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java ad9073d 
  giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java e74c59a 
  giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 87497b8 
  giraph-core/src/main/java/org/apache/giraph/utils/ConfigurationUtils.java 41238d0 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java 74c1f87 
  giraph-core/src/main/java/org/apache/giraph/yarn/GiraphApplicationMaster.java PRE-CREATION

  giraph-core/src/main/java/org/apache/giraph/yarn/GiraphYarnClient.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/yarn/GiraphYarnTask.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/yarn/YarnUtils.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/yarn/package-info.java PRE-CREATION 
  giraph-core/src/test/java/org/apache/giraph/yarn/TestYarnJob.java PRE-CREATION 
  giraph-core/src/test/resources/capacity-scheduler.xml PRE-CREATION 
  giraph-examples/pom.xml 3b6a08c 
  pom.xml 8d29304 

Diff: https://reviews.apache.org/r/9811/diff/


Testing
-------

Getting there, in-progress integration test is included for your amusment.


Thanks,

Eli Reisman


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message