hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Out of Memory during Reduce Merge
Date Mon, 14 Jun 2010 22:16:56 GMT
/usr/local/hadoop/hprof was an example directory name.
You need to find/create a path that exists on every task tracker node.

The flags can be specified in mapred.child.java.opts


On Mon, Jun 14, 2010 at 1:52 PM, Ruben Quintero <rfq_dev@yahoo.com> wrote:

> Increasing heap size seems like it would simply be delaying the problem
> versus addressing it (i.e. with a larger load, it would OOM again). Our
> memory use is tight as it is, so I don't think we could bump it up to 2G.
>
> As for the heapdump, I tried using the flags, but I guess I ran into
> HADOOP-4953 <https://issues.apache.org/jira/browse/HADOOP-4953>, as it
> would take all the arguments for the child and simply put in the heapsize
> and UseConcMarkSweepGC (even when it wasn't specified, oddly enough). I
> reduced the arguments down to just -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/usr/local/hadoop/hprof, but it seems the processes are not
> dumping regardless, not sure why. Any ideas on why? I'm going to try more
> general heap dump requirements, and see if I can get some logs.
>
> - Ruben
>
>
>
>
> ------------------------------
> *From:* Ted Yu <yuzhihong@gmail.com>
> *To:* mapreduce-user@hadoop.apache.org
> *Sent:* Sun, June 13, 2010 5:06:23 PM
>
> *Subject:* Re: Out of Memory during Reduce Merge
>
> Can you increase -Xmx to 2048m ?
>
> I suggest using the following flags:
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/hadoop/hprof
> -XX:+UseConcMarkSweepGC -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
> -XX:MaxPermSize=512m -XX:+PrintTenuringDistribution
>
> Once you obtain heap dump, you can use jhat to analyze it.
>
> Or you can use YourKit to profile your reducer.
>
> On Sun, Jun 13, 2010 at 12:31 PM, Ruben Quintero <rfq_dev@yahoo.com>wrote:
>
>> We're using hadoop 0.20.2. I saw that bug report (1182), but saw that our
>> version was already patched for it.
>>
>> Are there any minor fixes or workarounds you can think of? Is there
>> anything else I can send your way to give you more information as to what is
>> happening?
>>
>> Thanks,
>>
>> - Ruben
>>
>>
>> ------------------------------
>> *From:* Ted Yu <yuzhihong@gmail.com>
>> *To:* mapreduce-user@hadoop.apache.org
>> *Sent:* Sun, June 13, 2010 10:09:27 AM
>> *Subject:* Re: Out of Memory during Reduce Merge
>>
>> From stack trace, they are two different problems.
>> MAPREDUCE-1182 <https://issues.apache.org/jira/browse/MAPREDUCE-1182>didn't
solve all issues w.r.t. shuffling.
>> See 'Shuffle In Memory OutOfMemoryError' discussion where OOME was
>> reported even when MAPREDUCE-1182<https://issues.apache.org/jira/browse/MAPREDUCE-1182>has
been incorporated.
>>
>> On Sun, Jun 13, 2010 at 2:47 AM, Guo Leitao <leitao.guo@gmail.com> wrote:
>>
>>> Is this the same senario ?
>>> https://issues.apache.org/jira/browse/MAPREDUCE-1182
>>>
>>>
>>> 2010/6/12 Ruben Quintero <rfq_dev@yahoo.com>
>>>
>>>  Hi all,
>>>>
>>>> We have a MapReduce job writing a Lucene index (modeled closely after the
example in contrib), and we keep hitting out of memory exceptions in the reduce phase once
the number of files grows large.
>>>>
>>>>
>>>> Here are the relevant non-default values in our mapred-site.xml:
>>>>
>>>> mapred.child.java.opts: -Xmx1024M -XX:+UseConcMarkSweepGC
>>>> mapred.reduce.parallel.copies: 20
>>>> io.sort.factor: 100
>>>> io.sort.mb: 200
>>>>
>>>> Looking through the output logs, I think the problem occurs here:
>>>>
>>>>
>>>>
>>>>
>>>> INFO org.apache.hadoop.mapred.ReduceTask: Merging 7 files, 3005057577 bytes
from disk
>>>>
>>>> From what I can tell, it is trying to merge more than twice the heapsize
from disk, thus triggering the OOM. I've posted part of the log file below.
>>>>
>>>>
>>>>
>>>> We were looking at various mapreduce
>>>>  settings (mapred.job.shuffle.input.buffer.percent, mapred.job.shuffle.merge.percent,
mapred.inmem.merge.threshold), but weren't sure which settings might cause this issue. Does
anyone have any insight or suggestions as to where to start?
>>>>
>>>>
>>>>
>>>> Thank you,
>>>>
>>>> - Ruben
>>>>
>>>>
>>>>
>>>> 2010-06-11 15:27:11,519 INFO org.apache.hadoop.mapred.Merger: Merging 25
sorted segments
>>>> 2010-06-11 15:27:11,528 INFO org.apache.hadoop.mapred.Merger: Down to the
last merge-pass, with 25 segments left of total size: 321695767 bytes
>>>>
>>>>
>>>>
>>>> 2010-06-11 15:27:12,783 INFO org.apache.hadoop.mapred.ReduceTask: Merged
25 segments, 321695767 bytes to disk to satisfy reduce memory limit
>>>> 2010-06-11 15:27:12,785 INFO org.apache.hadoop.mapred.ReduceTask: Merging
7 files, 3005057577 bytes from disk
>>>>
>>>>
>>>>
>>>> 2010-06-11 15:27:12,799 INFO org.apache.hadoop.mapred.ReduceTask: Merging
0 segments, 0 bytes from memory into reduce
>>>> 2010-06-11 15:27:12,799 INFO org.apache.hadoop.mapred.Merger:
>>>>  Merging 7 sorted segments
>>>> 2010-06-11 15:27:20,891 FATAL org.apache.hadoop.mapred.TaskTracker: Error
running child : java.lang.OutOfMemoryError: Java heap space
>>>>
>>>> 	at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:342)
>>>>
>>>>
>>>> 	at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:404)
>>>> 	at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)
>>>> 	at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:420)
>>>> 	at org.apache.hadoop.mapred.Merger.merge(Merger.java:142)
>>>>
>>>>
>>>>
>>>> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2298)
>>>> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:570)
>>>> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
>>>>
>>>>
>>>>
>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>
>>>>
>>>>
>>>
>>
>>
>
>

Mime
View raw message