giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Byungnam Lim (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (GIRAPH-788) Giraph job suspends with exceptions when out-of-core options are set
Date Tue, 29 Oct 2013 12:07:31 GMT

     [ https://issues.apache.org/jira/browse/GIRAPH-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Byungnam Lim updated GIRAPH-788:
--------------------------------

    Description: 
When I run my code with out-of-core graph/message options OFF, it's fine. But when out-of-core
graph/message options ON, then some workers give me exception messages like below and whole
tasks suspends.
{noformat}
java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException
occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@3c7659ab
	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting
for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@3c7659ab
	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:181)
	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:139)
	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:124)
	at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:87)
	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:221)
	at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:283)
	at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:327)
	at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:508)
	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:246)
	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91)
	... 7 more
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: getOrCreatePartition:
cannot retrieve partition 6
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262)
	at java.util.concurrent.FutureTask.get(FutureTask.java:119)
	at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:300)
	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:173)
	... 16 more
Caused by: java.lang.IllegalStateException: getOrCreatePartition: cannot retrieve partition
6
	at org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:243)
	at org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:110)
	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:482)
	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:276)
	at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:172)
	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:267)
	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:211)
	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)
Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
	at java.util.concurrent.FutureTask.get(FutureTask.java:111)
	at org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:228)
	... 13 more
Caused by: java.lang.NullPointerException
	at org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:692)
	at org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:658)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at org.apache.giraph.partition.DiskBackedPartitionStore$DirectExecutorService.execute(DiskBackedPartitionStore.java:972)
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132)
	... 14 more
{noformat}

This exception occurs when superstep = -1.

Strange things are that i) when I give option to run the job with equal or less than 10 workers,
or ii ) when I run one of the example codes in giraph-examples - particularly, SimpleShortestPath
with 32 workers, the job finishes fine. The exceptions only occur when I run my own code with
larger than 10 workers. Then it goes out of the way.

I found that there was a similar - yet as far as I know, the very same problem before in GIRAPH-462,
but the issue is marked as 'Resolved' and 'Fixed'. Does this issue really fixed and am I just
doing wrong?

My input size was 75 MBytes with about 1 million nodes but I tested and found this problem
does not depends on the input sizes.

  was:
eWhen I run my code with out-of-core graph/message options OFF, its fine. But when out-of-core
graph/message options ON, then some workers give me exception messages like below and whole
tasks suspends.
{noformat}
java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException
occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@3c7659ab
	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting
for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@3c7659ab
	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:181)
	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:139)
	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:124)
	at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:87)
	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:221)
	at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:283)
	at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:327)
	at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:508)
	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:246)
	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91)
	... 7 more
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: getOrCreatePartition:
cannot retrieve partition 6
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262)
	at java.util.concurrent.FutureTask.get(FutureTask.java:119)
	at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:300)
	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:173)
	... 16 more
Caused by: java.lang.IllegalStateException: getOrCreatePartition: cannot retrieve partition
6
	at org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:243)
	at org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:110)
	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:482)
	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:276)
	at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:172)
	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:267)
	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:211)
	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)
Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
	at java.util.concurrent.FutureTask.get(FutureTask.java:111)
	at org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:228)
	... 13 more
Caused by: java.lang.NullPointerException
	at org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:692)
	at org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:658)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at org.apache.giraph.partition.DiskBackedPartitionStore$DirectExecutorService.execute(DiskBackedPartitionStore.java:972)
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132)
	... 14 more
{noformat}

This exception occurs when superstep = -1.

Strange things are that i) when I give option to run the job with equal or less than 10 workers,
or ii ) when I run one of the example codes in giraph-examples - particularly, SimpleShortestPath
with 32 workers, the job finishes fine. The exceptions only occur when I run my own code with
larger than 10 workers. Then it goes out of the way.

I found that there was a similar - yet as far as I know, the very same problem before in GIRAPH-462,
but the issue is marked as 'Resolved' and 'Fixed'. Does this issue really fixed and am I just
doing wrong?

My input size was 75 MBytes with about 1 million nodes but I tested and found this problem
does not depends on the input sizes.


> Giraph job suspends with exceptions when out-of-core options are set
> --------------------------------------------------------------------
>
>                 Key: GIRAPH-788
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-788
>             Project: Giraph
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 1.0.0
>         Environment: uses hadoop 0.20.203.0 with 32 cluster nodes
> Giraph release-1.0 pulled Oct. 19. 2013. 
>            Reporter: Byungnam Lim
>
> When I run my code with out-of-core graph/message options OFF, it's fine. But when out-of-core
graph/message options ON, then some workers give me exception messages like below and whole
tasks suspends.
> {noformat}
> java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException
occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@3c7659ab
> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while
waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@3c7659ab
> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:181)
> 	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:139)
> 	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:124)
> 	at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:87)
> 	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:221)
> 	at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:283)
> 	at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:327)
> 	at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:508)
> 	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:246)
> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91)
> 	... 7 more
> Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException:
getOrCreatePartition: cannot retrieve partition 6
> 	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262)
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:119)
> 	at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:300)
> 	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:173)
> 	... 16 more
> Caused by: java.lang.IllegalStateException: getOrCreatePartition: cannot retrieve partition
6
> 	at org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:243)
> 	at org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:110)
> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:482)
> 	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:276)
> 	at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:172)
> 	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:267)
> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:211)
> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:722)
> Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
> 	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:111)
> 	at org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:228)
> 	... 13 more
> Caused by: java.lang.NullPointerException
> 	at org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:692)
> 	at org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:658)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> 	at org.apache.giraph.partition.DiskBackedPartitionStore$DirectExecutorService.execute(DiskBackedPartitionStore.java:972)
> 	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132)
> 	... 14 more
> {noformat}
> This exception occurs when superstep = -1.
> Strange things are that i) when I give option to run the job with equal or less than
10 workers, or ii ) when I run one of the example codes in giraph-examples - particularly,
SimpleShortestPath with 32 workers, the job finishes fine. The exceptions only occur when
I run my own code with larger than 10 workers. Then it goes out of the way.
> I found that there was a similar - yet as far as I know, the very same problem before
in GIRAPH-462, but the issue is marked as 'Resolved' and 'Fixed'. Does this issue really fixed
and am I just doing wrong?
> My input size was 75 MBytes with about 1 million nodes but I tested and found this problem
does not depends on the input sizes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message