giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hassan Eslami <hsn.esl...@gmail.com>
Subject Re: Problems with running page rank using OutOfCore setting
Date Sun, 16 Jul 2017 21:04:59 GMT
Hi,

giraph.useOutOfCoreMessages is no longer in use.

The main problem here is that you are using default flow control mechanism
(NoOpFlowControl), that causes a lot of outstanding/received messages. As a
consequence, you fill up the memory so fast, and the job would fail for
various reasons. Please use the following options instead:

 -Dgiraph.isStaticGraph=false -Dgiraph.useOutOfCoreGraph=true
-Dgiraph.waitForPerWorkerRequests=true

Note: the static graph has a known bug with the out-of-core mechanism.

Hope it helps,
Hassan

On Sun, Jul 16, 2017 at 1:54 PM, Darshan Mallenahalli Shankaralingappa <
dshankaralingappa@ntent.com> wrote:

> Hi,
>
> I am trying to run the page rank algorithm using giraph on a 3.5 billion
> node web graph on a relatively smaller Hadoop cluster (6 nodes with 225GB
> RAM total).
> I set the giraph.useOutOfCoreGraph and giraph.useOutOfCoreMessages to true
> and the application killed after some time.
>
> I am running the giraph job using this command:
>  yarn jar giraph-examples-1.2.0-for-hadoop-2.6.0-jar-with-dependencies.jar
> org.apache.giraph.GiraphRunner -Dgiraph.yarn.task.heap.mb=58880
> -Dgiraph.isStaticGraph=true -Dgiraph.useOutOfCoreGraph=true
> -Dgiraph.useOutOfCoreMessages=true org.apache.giraph.examples.PageRankComputation
> -vif org.apache.giraph.examples.LongDoubleNullTextInputFormat -vip
> /user/darshan/AdjList/ -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat
> -op /user/darshan/giraph_3.5B_ooc/ -w 8 -mc org.apache.giraph.examples.RandomWalkVertexMasterCompute
> -wc org.apache.giraph.examples.RandomWalkWorkerContext -ca
> org.apache.giraph.examples.RandomWalkVertex.teleportationProbability=0.15f
> -ca org.apache.giraph.examples.RandomWalkVertex.maxSupersteps=21
>
> Here is a log from the zookeeper:
>
> 2017-07-12 08:08:35,026 WARN [netty-client-worker-1]
> org.apache.giraph.comm.netty.handler.ResponseClientHandler:
> exceptionCaught: Channel failed with remote address <url>/<ip>:30006<
> http://hdpbcn-01.lv.ntent.com/10.100.21.118:30006>
>
> java.lang.ArrayIndexOutOfBoundsException: 1075052547
>         at org.apache.giraph.comm.flow_control.NoOpFlowControl.
> getAckSignalFlag(NoOpFlowControl.java:52)
>         at org.apache.giraph.comm.netty.NettyClient.messageReceived(
> NettyClient.java:796)
>         at org.apache.giraph.comm.netty.handler.ResponseClientHandler.
> channelRead(ResponseClientHandler.java:87)
>         at io.netty.channel.DefaultChannelHandlerContext.
> invokeChannelRead(DefaultChannelHandlerContext.java:338)
>         at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(
> DefaultChannelHandlerContext.java:324)
>         at io.netty.handler.codec.ByteToMessageDecoder.channelRead(
> ByteToMessageDecoder.java:153)
>         at io.netty.channel.DefaultChannelHandlerContext.
> invokeChannelRead(DefaultChannelHandlerContext.java:338)
>         at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(
> DefaultChannelHandlerContext.java:324)
>         at org.apache.giraph.comm.netty.InboundByteCounter.channelRead(
> InboundByteCounter.java:74)
>         at io.netty.channel.DefaultChannelHandlerContext.
> invokeChannelRead(DefaultChannelHandlerContext.java:338)
>         at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(
> DefaultChannelHandlerContext.java:324)
>         at io.netty.channel.DefaultChannelPipeline.fireChannelRead(
> DefaultChannelPipeline.java:785)
>         at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(
> AbstractNioByteChannel.java:126)
>         at io.netty.channel.nio.NioEventLoop.processSelectedKey(
> NioEventLoop.java:485)
>         at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(
> NioEventLoop.java:452)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:346)
>         at io.netty.util.concurrent.SingleThreadEventExecutor$2.
> run(SingleThreadEventExecutor.java:101)
>         at java.lang.Thread.run(Thread.java:745)
>
>
> I think this issue is related to the messaging stack rather than the
> algorithm.
> If not, can someone please help me with this or at least point me in the
> right direction?
>
> Cheers,
> Darshan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message