giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hassan Eslami <hsn.esl...@gmail.com>
Subject Re: Out of core thread exception : java concurrent exception
Date Mon, 16 May 2016 19:10:45 GMT
Ramesh,

The out-of-core mechanism keeps spilled data in files in local job
directory, which is usually obtained from Hadoop's "mapred.job.id". This
should be different from one run to another, so there shouldn't be any
conflict between different runs using out-of-core mechanism. However, you
may have manually overwritten related Hadoop/YARN config, so there may be
conflict in your case. That means, if you run your jobs subsequently, a
later job may make some decisions based on already existing files from a
previous job. This can be one reason you are getting this error. Please
make sure the local job directory is different from run to run, or simply
delete the "_bsp/_partitions" directory from your local job directory every
time you run your job using out-of-core.

As a side note, you don't need to specify out-of-core messages (
giraph.maxMessagesInMemory=100,giraph.useOutOfCoreMessages=true) anymore.
Also, you can try a new out-of-core feature in which you don't have to
specify the number of partitions in memory either (you can also get rid of
giraph.maxPartitionsInMemory=5). This new feature is extensively tested,
but is still under review and has not been pushed to the code base yet. You
can access this feature here: https://reviews.facebook.net/D55479

Best,
Hassan

On Sat, May 14, 2016 at 10:46 PM, Ramesh Krishnan <ramesh.154089@gmail.com>
wrote:

> Thanks Hassan. I have removed the checkpointing, still getting a different
> error
>
> *Script :*
>
> hadoop jar
> /usr/local/giraph.back.1.2.0/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.7.0-jar-with-dependencies.jar
> org.apache.giraph.GiraphRunner -Dmapreduce.task.timeout=12000000
> -Dmapred.job.tracker=ip-172-31-42-220.eu-west-1.compute.internal:8021
> -Dmapreduce.map.memory.mb=23480 -Dmapreduce.map.java.opts=-Xmx22480m
> org.apache.giraph.examples.ConnectedComponentsComputation   -vif
> org.apache.giraph.io.formats.IntIntNullTextInputFormat -vip /test/input_10M
> -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
> /test/ouput_10M -w 5 -ca
> giraph.userPartitionCount=150,giraph.SplitMasterWorker=true,giraph.isStaticGraph=true,giraph.maxPartitionsInMemory=5,mapred.map.max.attempts=2,giraph.maxMessagesInMemory=100,giraph.useOutOfCoreMessages=true,giraph.useOutOfCoreGraph=true
>
> *Exception:*
>
> 2016-05-15 05:34:28,113 INFO [ooc-io-0] org.apache.giraph.ooc.OutOfCoreIOCallable: call:
execution of IO command LoadPartitionIOCommand: (partitionId = 107, superstep = 0) failed!
> 2016-05-15 05:34:28,114 ERROR [ooc-io-0] org.apache.giraph.utils.LogStacktraceCallable:
Execution of callable failed
> java.lang.RuntimeException: java.io.EOFException
> 	at org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:76)
> 	at org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:30)
> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.EOFException
> 	at java.io.DataInputStream.readInt(DataInputStream.java:392)
> 	at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:47)
> 	at org.apache.giraph.ooc.data.DiskBackedPartitionStore.readOutEdges(DiskBackedPartitionStore.java:286)
> 	at org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadInMemoryPartitionData(DiskBackedPartitionStore.java:329)
> 	at org.apache.giraph.ooc.data.OutOfCoreDataManager.loadPartitionData(OutOfCoreDataManager.java:195)
> 	at org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadPartitionData(DiskBackedPartitionStore.java:360)
> 	at org.apache.giraph.ooc.io.LoadPartitionIOCommand.execute(LoadPartitionIOCommand.java:64)
> 	at org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:72)
> 	... 6 more
> 2016-05-15 05:34:28,117 INFO [ooc-io-0] org.apache.giraph.ooc.OutOfCoreIOCallableFactory:
afterExecute: an out-of-core thread terminated unexpectedly with java.util.concurrent.ExecutionException:
java.lang.RuntimeException: java.io.EOFException
> 2016-05-15 05:34:28,441 INFO [compute-0] org.apache.giraph.ooc.FixedOutOfCoreEngine:
doneProcessingPartition: processing partition 117 is done!
> 2016-05-15 05:34:29,111 INFO [compute-0] org.apache.giraph.ooc.FixedOutOfCoreEngine:
doneProcessingPartition: processing partition 27 is done!
> 2016-05-15 05:34:29,620 INFO [compute-0] org.apache.giraph.ooc.FixedOutOfCoreEngine:
doneProcessingPartition: processing partition 127 is done!
> 2016-05-15 05:34:30,123 INFO [compute-0] org.apache.giraph.ooc.FixedOutOfCoreEngine:
doneProcessingPartition: processing partition 22 is done!
> 2016-05-15 05:34:30,123 INFO [compute-0] org.apache.giraph.ooc.FixedOutOfCoreEngine:
getNextPartition: waiting until a partition becomes available!
> 2016-05-15 05:34:31,123 ERROR [compute-0] org.apache.giraph.utils.LogStacktraceCallable:
Execution of callable failed
> java.lang.RuntimeException: Job Failed due to a failure in an out-of-core IO thread
> 	at org.apache.giraph.ooc.FixedOutOfCoreEngine.getNextPartition(FixedOutOfCoreEngine.java:81)
> 	at org.apache.giraph.ooc.data.DiskBackedPartitionStore.getNextPartition(DiskBackedPartitionStore.java:187)
> 	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:153)
> 	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:69)
> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> 2016-05-15 05:34:31,124 ERROR [main] org.apache.giraph.graph.GraphMapper: Caught an unrecoverable
exception Exception occurred
> java.lang.IllegalStateException: Exception occurred
> 	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:253)
> 	at org.apache.giraph.graph.GraphTaskManager.processGraphPartitions(GraphTaskManager.java:761)
> 	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:349)
> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Job Failed
due to a failure in an out-of-core IO thread
> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:206)
> 	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:250)
> 	... 10 more
> Caused by: java.lang.RuntimeException: Job Failed due to a failure in an out-of-core
IO thread
> 	at org.apache.giraph.ooc.FixedOutOfCoreEngine.getNextPartition(FixedOutOfCoreEngine.java:81)
> 	at org.apache.giraph.ooc.data.DiskBackedPartitionStore.getNextPartition(DiskBackedPartitionStore.java:187)
> 	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:153)
> 	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:69)
> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> 2016-05-15 05:34:31,125 ERROR [main] org.apache.giraph.worker.BspServiceWorker: unregisterHealth:
Got failure, unregistering health on /_hadoopBsp/job_1463146675144_0036/_applicationAttemptsDir/0/_superstepDir/0/_workerHealthyDir/ip-172-31-37-39.eu-west-1.compute.internal_2
on superstep 0
>
>
>
> On Sun, May 15, 2016 at 3:54 AM, Hassan Eslami <hsn.eslami@gmail.com>
> wrote:
>
>> Hi Ramesh!
>>
>> Thanks for bringing this up, and thanks for trying out the new
>> out-of-core mechanism. The new out-of-core mechanism has not been
>> integrated with checkpointing yet. This is part of an ongoing project, and
>> we should have the integration within a few weeks. In the meantime, you can
>> try out-of-core without checkpointing enabled.
>>
>> Best,
>> Hassan
>>
>>
>> On Saturday, May 14, 2016, Ramesh Krishnan <ramesh.154089@gmail.com>
>> wrote:
>>
>>> PFA the correct logs for the concurrent exception
>>>
>>> 2016-05-14 19:10:55,733 ERROR [ooc-io-0] org.apache.giraph.utils.LogStacktraceCallable:
Execution of callable failed
>>> java.lang.RuntimeException: java.io.EOFException
>>> 	at org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:76)
>>> 	at org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:30)
>>> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> 	at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.io.EOFException
>>> 	at java.io.DataInputStream.readInt(DataInputStream.java:392)
>>> 	at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:47)
>>> 	at org.apache.giraph.ooc.data.DiskBackedPartitionStore.readOutEdges(DiskBackedPartitionStore.java:286)
>>> 	at org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadInMemoryPartitionData(DiskBackedPartitionStore.java:329)
>>> 	at org.apache.giraph.ooc.data.OutOfCoreDataManager.loadPartitionData(OutOfCoreDataManager.java:195)
>>> 	at org.apache.giraph.ooc.data.DiskBackedPartitionStore.loadPartitionData(DiskBackedPartitionStore.java:360)
>>> 	at org.apache.giraph.ooc.io.LoadPartitionIOCommand.execute(LoadPartitionIOCommand.java:64)
>>> 	at org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:72)
>>> 	... 6 more
>>> 2016-05-14 19:10:55,737 INFO [ooc-io-0] org.apache.giraph.ooc.OutOfCoreIOCallableFactory:
afterExecute: an out-of-core thread terminated unexpectedly with java.util.concurrent.ExecutionException:
java.lang.RuntimeException: java.io.EOFException
>>> 2016-05-14 19:10:55,739 INFO [checkpoint-vertices-7] org.apache.giraph.ooc.FixedOutOfCoreEngine:
getNextPartition: waiting until a partition becomes available!
>>> 2016-05-14 19:10:56,426 ERROR [checkpoint-vertices-6] org.apache.giraph.utils.LogStacktraceCallable:
Execution of callable failed
>>> java.lang.RuntimeException: Job Failed due to a failure in an out-of-core IO
thread
>>> 	at org.apache.giraph.ooc.FixedOutOfCoreEngine.getNextPartition(FixedOutOfCoreEngine.java:81)
>>> 	at org.apache.giraph.ooc.data.DiskBackedPartitionStore.getNextPartition(DiskBackedPartitionStore.java:187)
>>> 	at org.apache.giraph.worker.BspServiceWorker$3$1.call(BspServiceWorker.java:1398)
>>> 	at org.apache.giraph.worker.BspServiceWorker$3$1.call(BspServiceWorker.java:1392)
>>> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> 	at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>>>
>>> On Sun, May 15, 2016 at 1:02 AM, Ramesh Krishnan <
>>> ramesh.154089@gmail.com> wrote:
>>>
>>>>
>>>> Hi Team,
>>>>
>>>> I have the latest build of giraph running on a 5 node cluster. When i
>>>> try to use OutofCore Graph option for a huge data set like 600Milion edges
>>>> i am running into
>>>> the following exception. Please find below the script being executed
>>>> and the exception logs. I have tried all possible ways and could not avoid
>>>> this issue , i am really in need of your help.
>>>>
>>>> *Script:*hadoop jar
>>>> /usr/local/giraph.back.1.2.0/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.7.0-jar-with-dependencies.jar
>>>> org.apache.giraph.GiraphRunner -Dmapreduce.task.timeout=12000000
>>>> -Dmapred.job.tracker=ip-172-31-42-220.eu-west-1.compute.internal:8021
>>>> -Dmapreduce.map.memory.mb=23480 -Dmapreduce.map.java.opts=-Xmx22480m
>>>> org.apache.giraph.examples.ConnectedComponentsComputation   -vif
>>>> org.apache.giraph.io.formats.IntIntNullTextInputFormat -vip /test/input_10M
>>>> -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
>>>> /test/ouput_10M -w 5 -ca
>>>> giraph.userPartitionCount=150,giraph.SplitMasterWorker=true,giraph.isStaticGraph=true,giraph.maxPartitionsInMemory=10,mapred.map.max.attempts=2,giraph.maxMessagesInMemory=100,giraph.numOutputThreads=10,giraph.useOutOfCoreMessages=true,giraph.numOutputThreads=4,giraph.numInputThreads=4,giraph.useOutOfCoreGraph=true,giraph.cleanupCheckpointsAfterSuccess=true,giraph.checkpointFrequency=1
>>>>
>>>>
>>>>
>>>>
>>>> *Exception:hadoop jar
>>>> /usr/local/giraph.back.1.2.0/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.7.0-jar-with-dependencies.jar
>>>> org.apache.giraph.GiraphRunner -Dmapreduce.task.timeout=12000000
>>>> -Dmapred.job.tracker=ip-172-31-42-220.eu-west-1.compute.internal:8021
>>>> -Dmapreduce.map.memory.mb=23480 -Dmapreduce.map.java.opts=-Xmx22480m
>>>> org.apache.giraph.examples.ConnectedComponentsComputation   -vif
>>>> org.apache.giraph.io.formats.IntIntNullTextInputFormat -vip /test/input_10M
>>>> -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
>>>> /test/ouput_10M -w 5 -ca
>>>> giraph.userPartitionCount=150,giraph.SplitMasterWorker=true,giraph.isStaticGraph=true,giraph.maxPartitionsInMemory=10,mapred.map.max.attempts=2,giraph.maxMessagesInMemory=100,giraph.numOutputThreads=10,giraph.useOutOfCoreMessages=true,giraph.numOutputThreads=4,giraph.numInputThreads=4,giraph.useOutOfCoreGraph=true,giraph.cleanupCheckpointsAfterSuccess=true,giraph.checkpointFrequency=1*
>>>>
>>>> *thanks*
>>>>
>>>> *Ramesh*
>>>>
>>>>
>>>
>

Mime
View raw message