giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Yost <soozandjohny...@gmail.com>
Subject Giraph job hangs and is eventually killed
Date Sat, 05 Apr 2014 10:24:50 GMT
Hi Everyone,

I have a shortest path implementation that completes and outputs the
correct results to a counter, but then hangs after the last superstep and
is eventually killed by Hadoop.

Here's the output from the console:

main-SendThread(localhost.localdomain:2181)] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to server
localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate
using SASL (unknown error)
[main-SendThread(localhost.localdomain:2181)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established to
localhost.localdomain/127.0.0.1:2181, initiating session
[main-SendThread(localhost.localdomain:2181)] INFO
org.apache.zookeeper.ClientCnxn - Session establishment complete on server
localhost.localdomain/127.0.0.1:2181, sessionid = 0x1451fc674a30007,
negotiated timeout = 40000
14/04/04 22:19:44 INFO job.JobProgressTracker: Data from 1 workers -
Storing data: 0 out of 11 vertices stored; 0 out of 1 partitions stored;
min free memory on worker 1 - 119.73MB, average 119.73MB
14/04/04 22:19:45 INFO mapred.JobClient:  map 100% reduce 0%
14/04/04 22:19:49 INFO job.JobProgressTracker: Data from 1 workers -
Storing data: 0 out of 11 vertices stored; 0 out of 1 partitions stored;
min free memory on worker 1 - 119.73MB, average 119.73MB
14/04/04 22:19:54 INFO job.JobProgressTracker: Data from 1 workers -
Storing data: 0 out of 11 vertices stored; 0 out of 1 partitions stored;
min free memory on worker 1 - 119.44MB, average 119.44MB
1

This is the stack trace I see in Hadoop after the job is killed:

Caused by: java.lang.IllegalStateException: waitFor:
ExecutionException occurred while waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@43349eef
	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:193)
	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:151)
	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:136)
	at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:99)
	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:233)
	at org.apache.giraph.worker.BspServiceWorker.saveVertices(BspServiceWorker.java:1033)
	at org.apache.giraph.worker.BspServiceWorker.cleanup(BspServiceWorker.java:1179)
	at org.apache.giraph.graph.GraphTaskManager.cleanup(GraphTaskManager.java:843)
	at org.apache.giraph.graph.GraphMapper.cleanup(GraphMapper.java:81)
	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:93)
	... 7 more
Caused by: java.util.concurrent.ExecutionException:
java.lang.IllegalStateException:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed
to create file /user/prototype/giraph/twitter-path-result/_temporary/_attempt_201404012018_0003_m_000001_0/part-m-00001
for DFSClient_attempt_201404012018_0003_m_000001_0_-1149212770_1 on
client 127.0.0.1 because current leaseholder is trying to recreate
file.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1452)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1324)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1266)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:668)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:647)
	at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

I realize that the root cause appears to be within Hadoop and not Giraph,
but I am wondering if there is Giraph configuration parameter I am missing?
 In researching the HDFS exception (not many posts on this, BTW), one
responder opined that this exception is due to speculative execution being
enabled.

Also, I tested a standard Map/Reduce job writing to the same datablock and
it worked fine, so I don't think HDFS is the problem (corrupt datablock,
etc...)

Any ideas?

--John

Mime
View raw message