giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Laird <lai...@sfu.ca>
Subject Re: Fwd: Some questions related to Giraph Pur YARN implementation
Date Tue, 15 Oct 2013 23:18:12 GMT
Hmm, sounds like Giraph + YARN is definitely on the bleeding edge... 
thanks for all the work you folks are doing to get it working, I guess 
I'll lurk on the dev list for a while until you guys figure the pieces 
out. :)

Thanks again!

On 13-10-15 01:14 PM, Milinda Pathirage wrote:
> Forwarding to user list.
>
> ---------- Forwarded message ----------
> From: Milinda Pathirage<mpathira@umail.iu.edu>
> Date: Tue, Oct 15, 2013 at 3:23 PM
> Subject: Some questions related to Giraph Pur YARN implementation
> To: dev@giraph.apache.org
>
>
> Hi Eli,
>
> I tried scripts (giraph, giraph-env) found in bin directory to run
> Giraph sample mentioned in quick start guide. But I face some issues
> and had to do some patching to get it into a working state (Job
> submission works, but execution fails). Below are some things I
> noticed:
>
>    1. giraph script in 'bin' directory uses -libjars option. But this
> doesn't work with GiraphYarnClient. It should be -yj.
>    2. We need to add $GIRAPH_HOME + $VERTEX_IMPL_JAR_DIR (directory
> containing vertex implementation jar) to CLASSPATH manually due to the
> way YarnUtils.getLocalFiles is implemented. Basically we should add
> parent directories of Yarn Jars to class path. I am not sure which is
> the correct solution
>       * fixing get LocalFiles
>       * CLASSPATH base method
>    3. YarnUtils.populateJars method uses fileNames.contains(f.getName)
> to decide adding jar to local resource map. But if we use giraph
> script fileNames contains absolute paths of 'Yarn Lib Jars'. I got
> this working by using getAbsolute paths instead of getName.
>    4. After above changes we can successfully launch a job in YARN
> cluster using giraph script. But job fails due to a file path issue.
> When submitting job we serialize Giraph configuration to
> giraph-conf.xml. But "giraph.yarn.libjars" property contains list of
> files but with absolute paths from client machine which use to submit
> the job. For example in my scenario giraph jar is
> "/Users/mpathira/giraph-bin/giraph-0.2-SNAPSHOT-for-hadoop-2.0.6-alpha/giraph-0.2-SNAPSHOT-for-hadoop-2.0.6-alpha.jar".
> But GiraphApplicationMaster tries to access these files and fails
> because the file is not there in HDFS with the above name.
>
> If we only use jar names instead of paths for 'yarnjars' option we
> should be able to fix 4. But I am not sure whether that is the correct
> approach. May be we need to change how we serialize giraph-conf.xml in
> to HDFS. We can use HDFS paths instead of paths from client machine.
>
> @Eli
> I really appreciate your comments regarding above. I can create a JIRA
> ticket if needed.
>
> Thanks
> Milinda
>
> --
> Milinda Pathirage
>
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org
>
>

-- 
Matthew Laird
Lead Software Developer, Bioinformatics
Brinkman Laboratory
Simon Fraser University, Burnaby, BC, Canada

Mime
View raw message