giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milinda Pathirage <mpath...@umail.iu.edu>
Subject Fwd: Some questions related to Giraph Pur YARN implementation
Date Tue, 15 Oct 2013 20:14:36 GMT
Forwarding to user list.

---------- Forwarded message ----------
From: Milinda Pathirage <mpathira@umail.iu.edu>
Date: Tue, Oct 15, 2013 at 3:23 PM
Subject: Some questions related to Giraph Pur YARN implementation
To: dev@giraph.apache.org


Hi Eli,

I tried scripts (giraph, giraph-env) found in bin directory to run
Giraph sample mentioned in quick start guide. But I face some issues
and had to do some patching to get it into a working state (Job
submission works, but execution fails). Below are some things I
noticed:

  1. giraph script in 'bin' directory uses -libjars option. But this
doesn't work with GiraphYarnClient. It should be -yj.
  2. We need to add $GIRAPH_HOME + $VERTEX_IMPL_JAR_DIR (directory
containing vertex implementation jar) to CLASSPATH manually due to the
way YarnUtils.getLocalFiles is implemented. Basically we should add
parent directories of Yarn Jars to class path. I am not sure which is
the correct solution
     * fixing get LocalFiles
     * CLASSPATH base method
  3. YarnUtils.populateJars method uses fileNames.contains(f.getName)
to decide adding jar to local resource map. But if we use giraph
script fileNames contains absolute paths of 'Yarn Lib Jars'. I got
this working by using getAbsolute paths instead of getName.
  4. After above changes we can successfully launch a job in YARN
cluster using giraph script. But job fails due to a file path issue.
When submitting job we serialize Giraph configuration to
giraph-conf.xml. But "giraph.yarn.libjars" property contains list of
files but with absolute paths from client machine which use to submit
the job. For example in my scenario giraph jar is
"/Users/mpathira/giraph-bin/giraph-0.2-SNAPSHOT-for-hadoop-2.0.6-alpha/giraph-0.2-SNAPSHOT-for-hadoop-2.0.6-alpha.jar".
But GiraphApplicationMaster tries to access these files and fails
because the file is not there in HDFS with the above name.

If we only use jar names instead of paths for 'yarnjars' option we
should be able to fix 4. But I am not sure whether that is the correct
approach. May be we need to change how we serialize giraph-conf.xml in
to HDFS. We can use HDFS paths instead of paths from client machine.

@Eli
I really appreciate your comments regarding above. I can create a JIRA
ticket if needed.

Thanks
Milinda

--
Milinda Pathirage

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org


-- 
Milinda Pathirage

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org

Mime
View raw message