giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (GIRAPH-13) Port Giraph to YARN
Date Wed, 13 Mar 2013 23:06:15 GMT

     [ https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eli Reisman updated GIRAPH-13:
------------------------------

    Attachment: GIRAPH-13-9-r3.patch

OK, this is ready to go, passes mvn verify (with and without -Phadoop_yarn) and passes its
new integration tests with MiniYARNCluster.

In order to make the test cluster work, we will have to initially support 2.0.3-alpha and
up Hadoop versions only. I can attempt further backports on future a JIRA.

No more hardcoded includes, so you need -yj option on GiraphRunner and give it a comma-separated
list of jar filenames (no path) to make your job run. For instance:

{code}
mvn -Phadoop_yarn clean package

cp giraph-examples/target/giraph*-jar-with*.jar ~/hadoop/share/hadoop/giraph/

hstart # start your Hadoop-2.0.3-alpha cluster
       # AND your OWN instance of ZK on some port
       # put this in -ca giraph.zkList=... in the launch commands below if you don't use giraph-site
for this!

bin/hadoop --config etc/hadoop jar share/hadoop/giraph/giraph-examples-0.2-SNAPSHOT-for-hadoop-2.0.3-alpha-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner org.apache.giraph.examples.ConnectedComponentsVertex -w 3 -yh
1024 -yj giraph-examples-0.2-SNAPSHOT-for-hadoop-2.0.3-alpha-jar-with-dependencies.jar -vif
org.apache.giraph.io.formats.IntIntNullIntTextInputFormat -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat
-vip /user/ereisman/graph3milVerts -op /user/ereisman/output
{code}

the above will build the project, then transfer giraph-examples jar with deps to a folder
we are assuming is in or under a directory on the CLASSPATH, HADOOP_HOME, or at least your
working dir. Last, we run a components job (assuming we have some sample data in our HDFS
input dir, and a 2.0.3 cluster up and running)

right now all setStatus() calls go right into the task logs. So we didn't lose them, but they
are not aggregated in a web UI for us yet. logs are prefixed by task number (numbered 2 higher
than corresponding Giraph task #'s), task 1 is always our GiraphApplicationMaster.

JIRA's I will put up to relate to this: 

- create WebUI for Giraph

- add process launch to GiraphApplicationMaster for our local ZK if we chose one, put host:port
into zkList so Giraph-BSP doesn't take over and do it.

- backport to 2.0.2-alpha, or even 2.0.0 Hadoop

- lots of strange and wonderful new things are possible, we'll see about the rest as we go
along.

                
> Port Giraph to YARN
> -------------------
>
>                 Key: GIRAPH-13
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-13
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Eli Reisman
>         Attachments: GIRAPH-13-1.patch, GIRAPH-13-2.patch, GIRAPH-13-3.patch, GIRAPH-13-4.patch,
GIRAPH-13-5.patch, GIRAPH-13-6.patch, GIRAPH-13-7.patch, GIRAPH-13-8.patch, GIRAPH-13-9.patch,
GIRAPH-13-9-r1.patch, GIRAPH-13-9-r2.patch, GIRAPH-13-9-r3.patch
>
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop trunk, we should
think about what it would take to separate out the graph processing bits of Giraph from the
MR1-specific code so as to take advantage of the less-MR centric aspects of YARN, while still
supporting both over the medium term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message