giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Malhotra <>
Subject IllegalStateException getNextChannel with large number of vertices
Date Thu, 13 Feb 2014 10:43:35 GMT

I am using giraph-1.0.0 with hadoop 1.0.0.

My cluster has 4 nodes with 32 processors each.

I am using 24 workers with default checkpointing settings.

My implementation works fine for <0.3M vertices, 2.0M edges> but fails
on a data-set with <1.5M vertices, 10.3M edges> with the following

*Error on each worker:*

java.lang.IllegalStateException: run: Caught an unrecoverable
exception getNextChannel: Failed to connect to hadoop2:30000 in 1000
connect attempts
	at org.apache.hadoop.mapred.MapTask.runNewMapper(
	at org.apache.hadoop.mapred.Child$
	at Method)
	at org.apache.hadoop.mapred.Child.main(
Caused by: java.lang.IllegalStateException: getNextChannel: Failed to
connect to hadoop2:30000 in 1000 connect attempts
	at org.apache.giraph.comm.netty.NettyClient.getNextChannel(
	at org.apache.giraph.comm.netty.NettyClient.sendWritableRequest(
	at org.apache.giraph.comm.netty.NettyWorkerClient.sendWritableRequest(
	at org.apache.giraph.comm.netty.NettyWorkerAggregatorRequestProcessor.sendAggregatedValuesToMaster(
	at org.apache.giraph.worker.WorkerAggregatorHandler.finishSuperstep(
	at org.apache.giraph.worker.BspServiceWorker.finishSuperstep(
	at org.apache.giraph.graph.GraphTaskManager.completeSuperstepAndCollectStats(
	at org.apache.giraph.graph.GraphTaskManager.execute(
	... 7 more

*Error or Master:*

2014-02-13 13:21:13,889 INFO
org.apache.giraph.partition.PartitionUtils: analyzePartitionStats:
Edges - Mean: 427869, Min: Worker(hostname=hadoop5, MRtaskID=18,
port=30018) - 422292, Max: Worker(hostname=hadoop4, MRtaskID=19,
port=30019) - 433205
2014-02-13 13:21:13,967 INFO
org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out
of 24 workers finished on superstep 8 on path
2014-02-13 13:31:18,353 ERROR
org.apache.giraph.master.BspServiceMaster: superstepChosenWorkerAlive:
Missing chosen worker Worker(hostname=hadoop5, MRtaskID=6, port=30006)
on superstep 8
2014-02-13 13:31:18,362 INFO org.apache.giraph.master.MasterThread:
masterThread: Coordination of superstep 8 took 604.513 seconds ended
with state WORKER_FAILURE and is now on superstep 8
2014-02-13 13:31:19,663 ERROR org.apache.giraph.master.MasterThread:
masterThread: Master algorithm failed with
java.lang.ArrayIndexOutOfBoundsException: -1
	at org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(
2014-02-13 13:31:19,679 FATAL org.apache.giraph.graph.GraphMapper:
uncaughtException: OverrideExceptionHandler on thread
org.apache.giraph.master.MasterThread, msg =
java.lang.ArrayIndexOutOfBoundsException: -1, exiting...
java.lang.IllegalStateException: java.lang.ArrayIndexOutOfBoundsException: -1
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
	at org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(
2014-02-13 13:31:19,803 INFO org.apache.giraph.zk.ZooKeeperManager:
run: Shutdown hook started.
2014-02-13 13:31:19,803 WARN org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper
2014-02-13 13:31:20,276 INFO org.apache.zookeeper.ClientCnxn: Unable
to read additional data from server sessionid 0x1442a26dbf10000,
likely server has closed socket, closing socket connection and
attempting reconnect
2014-02-13 13:31:20,366 INFO org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: ZooKeeper process exited with 143 (note that
143 typically means killed).


Please let me know if any additional details are required.



IIT Delhi

View raw message