giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Vesse (JIRA)" <j...@apache.org>
Subject [jira] [Created] (GIRAPH-809) Worker Failure causes ArrayIndexOutOfBounds on BspServiceMaster
Date Tue, 03 Dec 2013 10:53:38 GMT
Rob Vesse created GIRAPH-809:
--------------------------------

             Summary: Worker Failure causes ArrayIndexOutOfBounds on BspServiceMaster
                 Key: GIRAPH-809
                 URL: https://issues.apache.org/jira/browse/GIRAPH-809
             Project: Giraph
          Issue Type: Bug
    Affects Versions: 1.1.0
            Reporter: Rob Vesse


If a worker fails for any reason (e.g. Out of Memory exception) the BspServiceMaster attempts
to recover from a checkpoint.  However this code does not protect itself from the default
Giraph behaviour of checkpointing being disabled thus resulting in the following ArrayIndexOutOfBoundsException:

{noformat}
2013-12-03 10:33:10,844 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses:
Successfully added 0 connections, (0 total connected) 0 failed, 0 failures total.
2013-12-03 10:33:10,844 INFO org.apache.giraph.partition.PartitionBalancer: balancePartitionsAcrossWorkers:
Using algorithm static
2013-12-03 10:33:10,844 INFO org.apache.giraph.partition.PartitionUtils: analyzePartitionStats:
Vertices - Mean: 333, Min: Worker(hostname=mbp-rvesse.home, MRtaskID=1, port=30001) - 333,
Max: Worker(hostname=mbp-rvesse.home, MRtaskID=2, port=30002) - 334
2013-12-03 10:33:10,844 INFO org.apache.giraph.partition.PartitionUtils: analyzePartitionStats:
Edges - Mean: 50000, Min: Worker(hostname=mbp-rvesse.home, MRtaskID=1, port=30001) - 49950,
Max: Worker(hostname=mbp-rvesse.home, MRtaskID=2, port=30002) - 50100
2013-12-03 10:33:10,850 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList:
0 out of 3 workers finished on superstep 2 on path /_hadoopBsp/job_201312031028_0001/_applicationAttemptsDir/0/_superstepDir/2/_workerFinishedDir
2013-12-03 10:33:10,850 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList:
Waiting on [mbp-rvesse.home_2, mbp-rvesse.home_3, mbp-rvesse.home_1]
2013-12-03 10:33:30,148 ERROR org.apache.giraph.master.BspServiceMaster: superstepChosenWorkerAlive:
Missing chosen worker Worker(hostname=mbp-rvesse.home, MRtaskID=2, port=30002) on superstep
2
2013-12-03 10:33:30,148 INFO org.apache.giraph.master.MasterThread: masterThread: Coordination
of superstep 2 took 19.31 seconds ended with state WORKER_FAILURE and is now on superstep
2
2013-12-03 10:33:30,156 ERROR org.apache.giraph.master.MasterThread: masterThread: Master
algorithm failed with ArrayIndexOutOfBoundsException
java.lang.ArrayIndexOutOfBoundsException: -1
	at org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(BspServiceMaster.java:1274)
	at org.apache.giraph.master.MasterThread.run(MasterThread.java:139)
2013-12-03 10:33:30,157 FATAL org.apache.giraph.graph.GraphMapper: uncaughtException: OverrideExceptionHandler
on thread org.apache.giraph.master.MasterThread, msg = java.lang.ArrayIndexOutOfBoundsException:
-1, exiting...
java.lang.IllegalStateException: java.lang.ArrayIndexOutOfBoundsException: -1
	at org.apache.giraph.master.MasterThread.run(MasterThread.java:185)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
	at org.apache.giraph.master.BspServiceMaster.getLastGoodCheckpoint(BspServiceMaster.java:1274)
	at org.apache.giraph.master.MasterThread.run(MasterThread.java:139)
{noformat}

It appears the code in BspServiceMaster does not properly check if the checkpoints array is
empty and just attempts to access the most recent checkpoint regardless.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message