giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hai Lan <lanhai1...@gmail.com>
Subject Re: Graph job self-killed after superstep 0 with large input
Date Fri, 22 May 2015 10:41:27 GMT
Hi Lukas

Thanks for quick response. It seems I found the problem.

On 2,6,14 worker, errors show:

raph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,606 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,607 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,607 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,607 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,607 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,607 ERROR [netty-client-worker-1]
org.apache.giraph.comm.netty.NettyClient: Request failed
java.nio.channels.ClosedChannelException
2015-05-22 05:20:57,607 WARN [netty-client-worker-1]
org.apache.giraph.comm.netty.handler.ResponseClientHandler:
exceptionCaught: Channel failed with remote address
bespin03c.umiacs.umd.edu/192.168.74.113:30005
java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
	at sun.nio.ch.IOUtil.read(IOUtil.java:192)
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
	at io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:446)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:871)
	at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:208)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:118)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:485)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:452)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
	at java.lang.Thread.run(Thread.java:745)


I checked with bespin03c.umiacs.umd.edu/192.168.74.113:30005 and it shows:


2015-05-22 05:20:50,028 ERROR [main]
org.apache.giraph.graph.GraphMapper: Caught an unrecoverable exception
waitFor: ExecutionException occurred while waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@7328027c
java.lang.IllegalStateException: waitFor: ExecutionException occurred
while waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@7328027c
	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:193)
	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:151)
	at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:136)
	at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:99)
	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:233)
	at org.apache.giraph.graph.GraphTaskManager.processGraphPartitions(GraphTaskManager.java:756)
	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:335)
	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:93)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.util.concurrent.ExecutionException:
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:188)
	at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.getResult(ProgressableUtils.java:327)
	at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:187)
	... 14 more



So the problem could be only solved by expand the memory of cluster if
I still use default hash way?


Thanks


Hai




Hai Lan, PhD student
hlan@umd.edu <cfu@umd.edu>
Department of Geographical Science
University of Maryland, College Park
1104 LeFrak Hall
College Park, MD 20742, USA

On Fri, May 22, 2015 at 6:32 AM, Lukas Nalezenec <
lukas.nalezenec@firma.seznam.cz> wrote:

>  On 22.5.2015 12:25, Hai Lan wrote:
>
> Missing chosen workers [Worker(hostname=bespin05.umiacs.umd.edu, MRtaskID=2, port=30002),
Worker(hostname=bespin04d.umiacs.umd.edu, MRtaskID=6, port=30006), Worker(hostname=bespin03a.umiacs.umd.edu,
MRtaskID=14, port=30014)] on superstep 0
>
>
> Hi,
> See in logs what happened on the missing workers.
> Lukas
>

Mime
View raw message