giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jose Luis Larroque (JIRA)" <j...@apache.org>
Subject [jira] [Created] (GIRAPH-1101) Giraph hangs indefinitely when two or more workers process the same vertice on the same superstep
Date Sun, 31 Jul 2016 01:47:20 GMT
Jose Luis Larroque created GIRAPH-1101:
------------------------------------------

             Summary: Giraph hangs indefinitely when two or more workers process the same
vertice on the same superstep
                 Key: GIRAPH-1101
                 URL: https://issues.apache.org/jira/browse/GIRAPH-1101
             Project: Giraph
          Issue Type: Bug
    Affects Versions: 1.1.0
            Reporter: Jose Luis Larroque
            Priority: Minor


If two workers (or more) are proccesing the same vertice on same superstep (for example, doing
mulple BFS at the same time, could lead to it, depending of the data of course), the entire
superstep hangs, every workers say something like this:

16/07/29 22:49:19 INFO utils.ProgressableUtils: waitFor: Future result not ready yet java.util.concurrent.FutureTask@23a1ef14
16/07/29 22:49:19 INFO utils.ProgressableUtils: waitFor: Waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@5c571c52
16/07/29 22:50:19 INFO utils.ProgressableUtils: waitFor: Future result not ready yet java.util.concurrent.FutureTask@23a1ef14

And the master says:
16/07/29 21:43:19 INFO yarn.GiraphYarnTask: [STATUS: task-0] MASTER_ZOOKEEPER_ONLY - 0 finished
out of 4 on superstep 4
16/07/29 21:43:19 DEBUG master.BspServiceMaster: barrierOnWorkerList: Got finished worker
list = [], size = 0, worker list = [Worker(hostname=ip-172-31-23-9.sa-east-1.compute.internal,
MRtaskID=1, port=30001), Worker(hostname=ip-172-31-23-12.sa-east-1.compute.internal, MRtaskID=2,
port=30002), Worker(hostname=ip-172-31-23-11.sa-east-1.compute.internal, MRtaskID=3, port=30003),
Worker(hostname=ip-172-31-23-9.sa-east-1.compute.internal, MRtaskID=5, port=30005)], size
= 4 from /_hadoopBsp/giraph_yarn_application_1469827475142_0001/_applicationAttemptsDir/0/_superstepDir/4/_workerFinishedDir
16/07/29 21:43:19 INFO yarn.GiraphYarnTask: [STATUS: task-0] MASTER_ZOOKEEPER_ONLY - 0 finished
out of 4 on superstep 4
16/07/29 21:43:19 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:43:29 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:43:29 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:43:39 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:43:39 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:43:49 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:43:49 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:43:59 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:43:59 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:44:09 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:44:09 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:44:19 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:44:19 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:44:29 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:44:29 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:44:39 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:44:39 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:44:49 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:44:49 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:44:59 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:44:59 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:45:09 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:45:09 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:45:19 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:45:19 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:45:29 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:45:29 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:45:39 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:45:39 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:45:49 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:45:49 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:45:59 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:45:59 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:46:09 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:46:09 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:46:19 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:46:19 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:46:29 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:46:29 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:46:39 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:46:39 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:46:49 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:46:49 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:46:59 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:46:59 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:47:09 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:47:09 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:47:19 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:47:19 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:47:29 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:47:29 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:47:39 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:47:39 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:47:49 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:47:49 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:47:59 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:47:59 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:48:09 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:48:09 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:48:19 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:48:19 DEBUG master.BspServiceMaster: barrierOnWorkerList: Got finished worker
list = [], size = 0, worker list = [Worker(hostname=ip-172-31-23-9.sa-east-1.compute.internal,
MRtaskID=1, port=30001), Worker(hostname=ip-172-31-23-12.sa-east-1.compute.internal, MRtaskID=2,
port=30002), Worker(hostname=ip-172-31-23-11.sa-east-1.compute.internal, MRtaskID=3, port=30003),
Worker(hostname=ip-172-31-23-9.sa-east-1.compute.internal, MRtaskID=5, port=30005)], size
= 4 from /_hadoopBsp/giraph_yarn_application_1469827475142_0001/_applicationAttemptsDir/0/_superstepDir/4/_workerFinishedDir
16/07/29 21:48:19 INFO master.BspServiceMaster: barrierOnWorkerList: 0 out of 4 workers finished
on superstep 4 on path /_hadoopBsp/giraph_yarn_application_1469827475142_0001/_applicationAttemptsDir/0/_superstepDir/4/_workerFinishedDir
16/07/29 21:48:19 INFO master.BspServiceMaster: barrierOnWorkerList: Waiting on [ip-172-31-23-12.sa-east-1.compute.internal_2,
ip-172-31-23-9.sa-east-1.compute.internal_5, ip-172-31-23-11.sa-east-1.compute.internal_3,
ip-172-31-23-9.sa-east-1.compute.internal_1]
16/07/29 21:48:19 INFO yarn.GiraphYarnTask: [STATUS: task-0] MASTER_ZOOKEEPER_ONLY - 0 finished
out of 4 on superstep 4
16/07/29 21:48:19 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:48:29 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:48:29 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 21:48:39 DEBUG zk.PredicateLock: waitMsecs: Got timed signaled of false
16/07/29 21:48:39 DEBUG zk.PredicateLock: waitMsecs: Wait for 10000
16/07/29 22:50:19 INFO utils.ProgressableUtils: waitFor: Waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@5c571c52



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message