giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alok Kumbhare <kumbh...@usc.edu>
Subject Multi-threading in giraph (Exception while using giraph.userPartitionCount)
Date Fri, 20 Sep 2013 20:19:49 GMT
Hi,
I am trying to run multi-threaded giraph workers. This is the command that
i use:

hadoop jar
giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.ConnectedComponentsComputation -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip in/road-template -vof
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
out/cc_mt_road4 -w 24 -ca
giraph.numComputeThreads=4,giraph.userPartitionCount=4

We have a 12 node cluster with 8 cores each. I am running 24 workers and
wish to run each worker in a multi-threaded way so that multiple vertices
are processed in parallel on a single node.

I read in a different thread that suggested to use
userPartitionCount=<threadcount> so that each thread works on a different
partition.

However when i do that, i get the following exception
ava.lang.IllegalStateException: run: Caught an unrecoverable exception null
 at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.NullPointerException
 at org.apache.giraph.comm.SendCache.<init>(SendCache.java:100)
 at org.apache.giraph.comm.SendEdgeCache.<init>(SendEdgeCache.java:50)
 at
org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.<init>(NettyWorkerClientRequestProcessor.java:128)
 at
org.apache.giraph.worker.InputSplitsCallable.<init>(InputSplitsCallable.java:104)
 at
org.apache.giraph.worker.VertexInputSplitsCallable.<init>(VertexInputSplitsCallable.java:98)
 at
org.apache.giraph.worker.VertexInputSplitsCallableFactory.newCallable(VertexInputSplitsCallableFactory.java:80)
 at
org.apache.giraph.worker.VertexInputSplitsCallableFactory.newCallable(VertexInputSplitsCallableFactory.java:37)
 at
org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:213)
 at
org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:283)
 at
org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:327)
 at
org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:508)
 at
org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:246)
 at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91)
 ... 7 more

when i run the command without giraph.userPartitionCount=4 but specify just
-ca giraph.numComputeThreads=4, i dont see any performance improvement.

Please suggest the correct way to use multi threading or point me to a
document.

Thanks,
Alok Kumbhare

Mime
View raw message