Hi Rob:

I had to do all those steps you talked about, specially at bootstrap I run a Bash script stored at s3 like this:

--core-key-value, giraph.zkList=localhost:2181, --mapred-key-value, mapreduce.job.counters.limit=1200

Then at the steps configuration I start by setting up Giraph and Zookeeper by calling two Bash scripts (two separate steps):

s3://elasticmapreduce/libs/script-runner/script-runner.jar s3://mybucket/install_giraph.sh
s3://elasticmapreduce/libs/script-runner/script-runner.jar s3://mybucket/install_zookeeper.sh

In the case of the install_giraph.sh I do this:

hadoop dfs -copyToLocal s3://mybucket/giraph.tar.gz /home/hadoop
tar -xzvf /home/hadoop/giraph.tar.gz -C /home/hadoop

and install_zookeeper.sh does this:

hadoop dfs -copyToLocal s3://data.clipesebandas/binaries/zookeeper.tar.gz /home/hadoop
tar -xzvf /home/hadoop/zookeeper.tar.gz -C /home/hadoop
/home/hadoop/zookeeper/bin/zkServer.sh start

And finally I run my Giraph algorithm in another step like this:

/home/hadoop/giraph.jar org.giraph.MyGraphAlgorithm  /user/hadoop/input_graph, /user/hadoop/built_graph  20 1

Perhaps some steps, like Zookeeper configuration, are not needed since this configuration is based on Giraph 0.1.
Hope this helps.

Cheers
Gustavo



On Mon, Nov 11, 2013 at 12:43 PM, Rob Vesse <rvesse@dotnetrdf.org> wrote:
Hi All

I've been looking around for any documentation about running Giraph on Amazon Elastic Map Reduce (EMR) and didn't turn up anything particularly useful.

It looks like the only real requirements to run on EMR are to add Bootstrap actions to the Job Flow configuration to apply the relevant Hadoop configuration settings e.g. increasing max map tasks.  After that it looks like I should just need to use a standard Custom JAR launch step to launch the Giraph Runner with appropriate arguments for my Giraph program.

Before I start trying to do this and incurring EC2 costs does anyone have experience of running Giraph applications on EMR that they are willing to share?  Any suggestions, tips, common pitfalls etc I should be aware of?

Cheers,

Rob