giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Beskow <Stefan.Bes...@sas.com>
Subject RE: Run SimpleShortestPathsVertex sample application using multiple workers
Date Wed, 26 Feb 2014 15:14:55 GMT
Here is my understanding of Giraph, but please Giraph experts correct me if this is wrong.
Giraph loads Hadoop configuration information from files in folder /etc/hadoop/conf. One of
properties Giraph looks for in mapred-site.xml is called mapred.job.tracker. If this property
is not defined or if it's set to "local" Giraph assumes that the Hadoop local job runner is
used. When class org.apache.giraph.job.GiraphJob executes it checks to see if the mapred.job.tracker
property is set "local" and then makes sure that the number of workers property is set to
one and that the split master worker property is set to false and otherwise throws an exception
indicating that the arguments are not valid for the local job runner.

I have access to three Cloudera clusters (CDH4.2, CHD4.5 and CDH5.0) and for each cluster
/etc/hadoop/conf/mapred-site.xml doesn't contain a property called mapred.job.tracker. However,
that property is defined in the Cloudera Manager console. So in order to inform Giraph of
what that value is I simply added another command line parameter to my Giraph command called
-Dmapred.job.tracker and that solved this problem.

Here is the full command:
hadoop jar /users/stbesk/snapshot_from_git/jars/giraph-ex.jar org.apache.giraph.GiraphRunner
-Dmapred.job.tracker=el01cn16.unx.sas.com org.apache.giraph.examples.SimpleShortestPathsComputation
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/stbesk/input/tiny-graph.txt
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/stbesk/output/shortestPath
-w 10 -ca giraph.SplitMasterWorker=true -ca giraph.zkList=el01cn16.unx.sas.com:2181

Cheers.
Stefan

From: Stefan Beskow
Sent: Monday, February 24, 2014 12:36 AM
To: 'user@giraph.apache.org'
Subject: Run SimpleShortestPathsVertex sample application using multiple workers

Hi.

I'm trying to run Giraph on Hadoop 2.0.0-cdh4.2.0 using a cluster with 60 nodes. When I run
the sample application org.apache.giraph.examples.SimpleShortestPathsVertex with just 1 worker
it works fine, but when I specify more than 1 worker it throws exception java.lang.IllegalArgumentException:
checkLocalJobRunnerConfiguration as shown below. Is there a way to pass a command line parameter
to Giraph so that it doesn't use the local job runner or do I need to update any of the Hadoop
configuration files for this to work?

Here is the command I use to run sample application with 2 workers:
hadoop jar giraph-examples-1.0.0-for-hadoop-2.0.0-cdh4.2.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner
-Dgiraph.zkList=rdcgrd001.unx.sas.com:2181 -libjars giraph-examples-1.0.0-for-hadoop-2.0.0-cdh4.2.0-jar-with-dependencies.jar
org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/stbesk/input/tiny_graph.txt -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat
-op /user/stbesk/output/shortestpathsC2 -ca SimpleShortestPathsVertex.source=2 -w 2 -ca giraph.SplitMasterWorker=true

Here is the exception:
14/02/24 00:20:23 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your
InputFormat does not require one.
14/02/24 00:20:23 INFO utils.ConfigurationUtils: Setting custom argument [SimpleShortestPathsVertex.source]
to [2] in GiraphConfiguration
14/02/24 00:20:23 INFO utils.ConfigurationUtils: Setting custom argument [giraph.SplitMasterWorker]
to [true] in GiraphConfiguration
14/02/24 00:20:23 WARN job.GiraphConfigurationValidator: Output format vertex index type is
not known
14/02/24 00:20:23 WARN job.GiraphConfigurationValidator: Output format vertex value type is
not known
14/02/24 00:20:23 WARN job.GiraphConfigurationValidator: Output format edge value type is
not known
14/02/24 00:20:23 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not
allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
Exception in thread "main" java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration:
When using LocalJobRunner, must have only one worker since only 1 task at a time!
        at org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:151)
        at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:225)
        at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:94)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

Appreciate any help.

Thanks.
Stefan



Mime
View raw message