giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hai Lan <lanhai1...@gmail.com>
Subject Re: OutOfMemoryError: Java heap space during Large graph running
Date Sun, 23 Oct 2016 17:47:12 GMT
More info:

If I add  -Dgiraph.useOutOfCoreGraph=true it can run successfully but
superstep -1 is extremely slow. If I do not add
Dgiraph.useOutOfCoreGraph=true, it
loads much faster but will show error at waiting about last 10 workers to
finished superstep -1. The error is:

org.apache.giraph.master.BspServiceMaster: *barrierOnWorkerList: Missing
chosen workers* [Worker(hostname=trantor17.umiacs.umd.edu, MRtaskID=124,
port=30124), Worker(hostname=trantor17.umiacs.umd.edu, MRtaskID=126,
port=30126), Worker(hostname=trantor17.umiacs.umd.edu, MRtaskID=128,
port=30128), Worker(hostname=trantor17.umiacs.umd.edu, MRtaskID=130,
port=30130)] on superstep -1
2016-10-23 10:40:16,358 ERROR [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.MasterThread: masterThread: Master algorithm
failed with IllegalStateException
java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed
during input split (currently not supported)

Seems this error is just like https://issues.apache.
org/jira/browse/GIRAPH-904 but there is no upper case in my hostnames

Any ideas about this?

Many Thanks,

Hai




On Sun, Oct 23, 2016 at 8:36 AM, Hai Lan <lanhai1988@gmail.com> wrote:

> Thanks Agrta
>
> Thanks for your response. How exact I can do to increase min and max
> RAM?(in which conf file or by using any command/arguments? my
> giraph-site.xml is empty as default).
>
> As I saw online how to increase the heap size(not sure it is the same
> thing like you mentioned min max RAM size), many people suggest to increase:
> mapred.child.java.opts OR HADOOP_DATANODE_OPTS
>
> But they are not help. My problem happen during "VertexInputSplitsCallable:
> readVertexInputSplit:", so I tried to increase mapreduce.map.memory.mb
> and decrease # of container/workers. Currently I'm using 248 workers and
> mapreduce.map.memory.mb=12000, ratio=0.7. This can help but I face new
> problem:
>
> 1. The superstep -1 is extremely slow, like take 7-8 hours to load a 150G
> graph:
> e.g.
> org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 106 out
> of 248 workers finished on superstep -1 on path
> /_hadoopBsp/job_1477020594559_0012/_vertexInputSplitDoneDir
>
> I saw in log like:
> INFO [main] org.apache.giraph.comm.netty.NettyClient:
> logInfoAboutOpenRequests: Waiting interval of 15000 msecs, 2499 open
> requests, waiting for it to be <= 0, MBytes/sec received = 0.0001,
> MBytesReceived = 0.0058, ave received req MBytes = 0, secs waited = 92.12
> MBytes/sec sent = 10.4373, MBytesSent = 961.4983, ave sent req MBytes =
> 0.3244, secs waited = 92.12
>
> To finish those 2499 open requests will take a very long time. *I'm not
> sure is this normal?*
>
> 2. I tried out-of-core graph option but I'm not sure I'm using it correct.
> I did add -Dgiraph.useOutOfCoreGraph=true -ca isStaticGraph=true,giraph.maxPartitionsInMemory=10.
> But how I know if it is work?
>
> I doubt when I tried 15T graph, the problem will be worse. What should I
> do?
>
> Thanks for your help.
>
> Best,
> Hai
>
>
> On Sun, Oct 23, 2016 at 7:11 AM, Agrta Rawat <agrta.rawat@gmail.com>
> wrote:
>
>> Hi Hai,
>>
>> Please check your giraph configurations. Try increasing min and max RAM
>> size in your configurations.
>> This should help.
>>
>> Regards,
>> Agrta Rawat
>>
>>
>> On Sat, Oct 22, 2016 at 7:46 PM, Hai Lan <lanhai1988@gmail.com> wrote:
>>
>>> Can anyone help with this?
>>>
>>> Thanks a lot!
>>>
>>>
>>> On Thu, Oct 20, 2016 at 9:48 PM, Hai Lan <lanhai1988@gmail.com> wrote:
>>>
>>>> Dear all,
>>>>
>>>> I'm facing a problem when I run large graph job (currently 1.6T, will
>>>> be 16T then), it always shows java.lang.OutOfMemoryError: Java heap
>>>> space error when loaded specific numbers of vertex(near 59000000). I tried
>>>> to add like:
>>>> -Dgiraph.useOutOfCoreGraph=true
>>>>  -Dmapred.child.java.opts="-XX:-UseGCOverheadLimit" OR
>>>> -Dmapred.child.java.opts="-Xmx16384"
>>>>  -Dgiraph.yarn.task.heap.mb=36570
>>>>
>>>> but the problem remain though I can see those value are shown in
>>>> Metadata.
>>>>
>>>> I'm not sure the max value of memory in this VertexInputSplitsCallable
>>>> info is related to java heap size.
>>>> INFO [load-0] org.apache.giraph.worker.VertexInputSplitsCallable:
>>>> readVertexInputSplit: Loaded 46975802 vertices at 68977.49310291892
>>>> vertices/sec 0 edges at 0.0 edges/sec Memory (free/total/max) = 475.08M /
>>>> 2759.00M / 2759.00M
>>>>
>>>> But I am noticed in main log, it *always* shows:
>>>> INFO [AsyncDispatcher event handler] org.apache.hadoop.mapred.JobConf:
>>>> Task java-opts do not specify heap size. Setting task attempt jvm max heap
>>>> size to -Xmx2868m
>>>> *no matter what arguments I added*. Even when I run normal Hadoop
>>>> jobs.
>>>>
>>>> Any ideas about this? Following is the log.
>>>>
>>>> 2016-10-20 21:25:49,008 ERROR [netty-client-worker-2]
>>>> org.apache.giraph.comm.netty.NettyClient: Request failed
>>>> java.lang.OutOfMemoryError: Java heap space
>>>> at io.netty.buffer.UnpooledHeapByteBuf.<init>(UnpooledHeapByteB
>>>> uf.java:45)
>>>> at io.netty.buffer.UnpooledByteBufAllocator.newHeapBuffer(Unpoo
>>>> ledByteBufAllocator.java:43)
>>>> at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(Abstract
>>>> ByteBufAllocator.java:136)
>>>> at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(Abstract
>>>> ByteBufAllocator.java:127)
>>>> at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByte
>>>> BufAllocator.java:85)
>>>> at org.apache.giraph.comm.netty.handler.RequestEncoder.write(Re
>>>> questEncoder.java:81)
>>>> at io.netty.channel.DefaultChannelHandlerContext.invokeWrite(De
>>>> faultChannelHandlerContext.java:645)
>>>> at io.netty.channel.DefaultChannelHandlerContext.access$2000(De
>>>> faultChannelHandlerContext.java:29)
>>>> at io.netty.channel.DefaultChannelHandlerContext$WriteTask.run(
>>>> DefaultChannelHandlerContext.java:906)
>>>> at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEve
>>>> ntExecutor.java:36)
>>>> at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(Sin
>>>> gleThreadEventExecutor.java:101)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2016-10-20 21:25:55,299 ERROR [netty-client-worker-1]
>>>> org.apache.giraph.comm.netty.NettyClient: Request failed
>>>> java.lang.OutOfMemoryError: Java heap space
>>>> at io.netty.buffer.UnpooledHeapByteBuf.<init>(UnpooledHeapByteB
>>>> uf.java:45)
>>>> at io.netty.buffer.UnpooledByteBufAllocator.newHeapBuffer(Unpoo
>>>> ledByteBufAllocator.java:43)
>>>> at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(Abstract
>>>> ByteBufAllocator.java:136)
>>>> at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(Abstract
>>>> ByteBufAllocator.java:127)
>>>> at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByte
>>>> BufAllocator.java:85)
>>>> at org.apache.giraph.comm.netty.handler.RequestEncoder.write(Re
>>>> questEncoder.java:81)
>>>> at io.netty.channel.DefaultChannelHandlerContext.invokeWrite(De
>>>> faultChannelHandlerContext.java:645)
>>>> at io.netty.channel.DefaultChannelHandlerContext.access$2000(De
>>>> faultChannelHandlerContext.java:29)
>>>> at io.netty.channel.DefaultChannelHandlerContext$WriteTask.run(
>>>> DefaultChannelHandlerContext.java:906)
>>>> at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEve
>>>> ntExecutor.java:36)
>>>> at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(Sin
>>>> gleThreadEventExecutor.java:101)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2016-10-20 21:26:06,731 ERROR [main] org.apache.giraph.graph.GraphMapper:
>>>> Caught an unrecoverable exception waitFor: ExecutionException occurred
>>>> while waiting for org.apache.giraph.utils.Progre
>>>> ssableUtils$FutureWaitable@6737a445
>>>> java.lang.IllegalStateException: waitFor: ExecutionException occurred
>>>> while waiting for org.apache.giraph.utils.Progre
>>>> ssableUtils$FutureWaitable@6737a445
>>>> at org.apache.giraph.utils.ProgressableUtils.waitFor(Progressab
>>>> leUtils.java:193)
>>>> at org.apache.giraph.utils.ProgressableUtils.waitForever(Progre
>>>> ssableUtils.java:151)
>>>> at org.apache.giraph.utils.ProgressableUtils.waitForever(Progre
>>>> ssableUtils.java:136)
>>>> at org.apache.giraph.utils.ProgressableUtils.getFutureResult(Pr
>>>> ogressableUtils.java:99)
>>>> at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCal
>>>> lables(ProgressableUtils.java:233)
>>>> at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(Bs
>>>> pServiceWorker.java:316)
>>>> at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspSe
>>>> rviceWorker.java:409)
>>>> at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWo
>>>> rker.java:629)
>>>> at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskMa
>>>> nager.java:284)
>>>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:93)
>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>>> upInformation.java:1693)
>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>> Caused by: java.util.concurrent.ExecutionException:
>>>> java.lang.OutOfMemoryError: Java heap space
>>>> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>>>> at java.util.concurrent.FutureTask.get(FutureTask.java:202)
>>>> at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.wai
>>>> tFor(ProgressableUtils.java:312)
>>>> at org.apache.giraph.utils.ProgressableUtils.waitFor(Progressab
>>>> leUtils.java:185)
>>>> ... 16 more
>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>> at org.apache.giraph.utils.UnsafeByteArrayOutputStream.<init>(U
>>>> nsafeByteArrayOutputStream.java:81)
>>>> at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.c
>>>> reateExtendedDataOutput(ImmutableClassesGiraphConfiguration.java:1161)
>>>> at org.apache.giraph.comm.SendPartitionCache.addVertex(SendPart
>>>> itionCache.java:77)
>>>> at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcess
>>>> or.sendVertexRequest(NettyWorkerClientRequestProcessor.java:248)
>>>> at org.apache.giraph.worker.VertexInputSplitsCallable.readInput
>>>> Split(VertexInputSplitsCallable.java:231)
>>>> at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(
>>>> InputSplitsCallable.java:267)
>>>> at org.apache.giraph.worker.InputSplitsCallable.call(InputSplit
>>>> sCallable.java:211)
>>>> at org.apache.giraph.worker.InputSplitsCallable.call(InputSplit
>>>> sCallable.java:60)
>>>> at org.apache.giraph.utils.LogStacktraceCallable.call(LogStackt
>>>> raceCallable.java:51)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>> Executor.java:1145)
>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>> lExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2016-10-20 21:26:06,737 ERROR [main] org.apache.giraph.worker.BspServiceWorker:
>>>> unregisterHealth: Got failure, unregistering health on
>>>> /_hadoopBsp/job_1476386340018_0175/_applicationAttemptsDir/0
>>>> /_superstepDir/-1/_workerHealthyDir/hadoop18.umd.com_23 on superstep -1
>>>> 2016-10-20 21:26:06,746 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>>> Exception running child : java.lang.IllegalStateException: run: Caught
>>>> an unrecoverable exception waitFor: ExecutionException occurred while
>>>> waiting for org.apache.giraph.utils.Progre
>>>> ssableUtils$FutureWaitable@6737a445
>>>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:104)
>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>>>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>>> upInformation.java:1693)
>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>> Caused by: java.lang.IllegalStateException: waitFor:
>>>> ExecutionException occurred while waiting for org.apache.giraph.utils.Progre
>>>> ssableUtils$FutureWaitable@6737a445
>>>> at org.apache.giraph.utils.ProgressableUtils.waitFor(Progressab
>>>> leUtils.java:193)
>>>> at org.apache.giraph.utils.ProgressableUtils.waitForever(Progre
>>>> ssableUtils.java:151)
>>>> at org.apache.giraph.utils.ProgressableUtils.waitForever(Progre
>>>> ssableUtils.java:136)
>>>> at org.apache.giraph.utils.ProgressableUtils.getFutureResult(Pr
>>>> ogressableUtils.java:99)
>>>> at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCal
>>>> lables(ProgressableUtils.java:233)
>>>> at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(Bs
>>>> pServiceWorker.java:316)
>>>> at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspSe
>>>> rviceWorker.java:409)
>>>> at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWo
>>>> rker.java:629)
>>>> at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskMa
>>>> nager.java:284)
>>>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:93)
>>>> ... 7 more
>>>> Caused by: java.util.concurrent.ExecutionException:
>>>> java.lang.OutOfMemoryError: Java heap space
>>>> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>>>> at java.util.concurrent.FutureTask.get(FutureTask.java:202)
>>>> at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.wai
>>>> tFor(ProgressableUtils.java:312)
>>>> at org.apache.giraph.utils.ProgressableUtils.waitFor(Progressab
>>>> leUtils.java:185)
>>>> ... 16 more
>>>> Caused by: java.lang.OutOfMemoryError: Java heap space
>>>> at org.apache.giraph.utils.UnsafeByteArrayOutputStream.<init>(U
>>>> nsafeByteArrayOutputStream.java:81)
>>>> at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.c
>>>> reateExtendedDataOutput(ImmutableClassesGiraphConfiguration.java:1161)
>>>> at org.apache.giraph.comm.SendPartitionCache.addVertex(SendPart
>>>> itionCache.java:77)
>>>> at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcess
>>>> or.sendVertexRequest(NettyWorkerClientRequestProcessor.java:248)
>>>> at org.apache.giraph.worker.VertexInputSplitsCallable.readInput
>>>> Split(VertexInputSplitsCallable.java:231)
>>>> at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(
>>>> InputSplitsCallable.java:267)
>>>> at org.apache.giraph.worker.InputSplitsCallable.call(InputSplit
>>>> sCallable.java:211)
>>>> at org.apache.giraph.worker.InputSplitsCallable.call(InputSplit
>>>> sCallable.java:60)
>>>> at org.apache.giraph.utils.LogStacktraceCallable.call(LogStackt
>>>> raceCallable.java:51)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>> Executor.java:1145)
>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>> lExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>>
>>>>
>>>> Thank you so much!
>>>>
>>>> Best,
>>>>
>>>> Hai
>>>>
>>>
>>>
>>
>

Mime
View raw message