hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Out of Memory during Reduce Merge
Date Sun, 13 Jun 2010 21:06:23 GMT
Can you increase -Xmx to 2048m ?

I suggest using the following flags:
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/hadoop/hprof
-XX:+UseConcMarkSweepGC -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-XX:MaxPermSize=512m -XX:+PrintTenuringDistribution

Once you obtain heap dump, you can use jhat to analyze it.

Or you can use YourKit to profile your reducer.

On Sun, Jun 13, 2010 at 12:31 PM, Ruben Quintero <rfq_dev@yahoo.com> wrote:

> We're using hadoop 0.20.2. I saw that bug report (1182), but saw that our
> version was already patched for it.
>
> Are there any minor fixes or workarounds you can think of? Is there
> anything else I can send your way to give you more information as to what is
> happening?
>
> Thanks,
>
> - Ruben
>
>
> ------------------------------
> *From:* Ted Yu <yuzhihong@gmail.com>
> *To:* mapreduce-user@hadoop.apache.org
> *Sent:* Sun, June 13, 2010 10:09:27 AM
> *Subject:* Re: Out of Memory during Reduce Merge
>
> From stack trace, they are two different problems.
> MAPREDUCE-1182 <https://issues.apache.org/jira/browse/MAPREDUCE-1182>didn't solve
all issues w.r.t. shuffling.
> See 'Shuffle In Memory OutOfMemoryError' discussion where OOME was reported
> even when MAPREDUCE-1182<https://issues.apache.org/jira/browse/MAPREDUCE-1182>has
been incorporated.
>
> On Sun, Jun 13, 2010 at 2:47 AM, Guo Leitao <leitao.guo@gmail.com> wrote:
>
>> Is this the same senario ?
>> https://issues.apache.org/jira/browse/MAPREDUCE-1182
>>
>>
>> 2010/6/12 Ruben Quintero <rfq_dev@yahoo.com>
>>
>>  Hi all,
>>>
>>> We have a MapReduce job writing a Lucene index (modeled closely after the example
in contrib), and we keep hitting out of memory exceptions in the reduce phase once the number
of files grows large.
>>>
>>> Here are the relevant non-default values in our mapred-site.xml:
>>>
>>> mapred.child.java.opts: -Xmx1024M -XX:+UseConcMarkSweepGC
>>> mapred.reduce.parallel.copies: 20
>>> io.sort.factor: 100
>>> io.sort.mb: 200
>>>
>>> Looking through the output logs, I think the problem occurs here:
>>>
>>>
>>>
>>> INFO org.apache.hadoop.mapred.ReduceTask: Merging 7 files, 3005057577 bytes from
disk
>>>
>>> From what I can tell, it is trying to merge more than twice the heapsize from
disk, thus triggering the OOM. I've posted part of the log file below.
>>>
>>>
>>> We were looking at various mapreduce
>>>  settings (mapred.job.shuffle.input.buffer.percent, mapred.job.shuffle.merge.percent,
mapred.inmem.merge.threshold), but weren't sure which settings might cause this issue. Does
anyone have any insight or suggestions as to where to start?
>>>
>>>
>>> Thank you,
>>>
>>> - Ruben
>>>
>>>
>>>
>>> 2010-06-11 15:27:11,519 INFO org.apache.hadoop.mapred.Merger: Merging 25 sorted
segments
>>> 2010-06-11 15:27:11,528 INFO org.apache.hadoop.mapred.Merger: Down to the last
merge-pass, with 25 segments left of total size: 321695767 bytes
>>>
>>>
>>> 2010-06-11 15:27:12,783 INFO org.apache.hadoop.mapred.ReduceTask: Merged 25 segments,
321695767 bytes to disk to satisfy reduce memory limit
>>> 2010-06-11 15:27:12,785 INFO org.apache.hadoop.mapred.ReduceTask: Merging 7 files,
3005057577 bytes from disk
>>>
>>>
>>> 2010-06-11 15:27:12,799 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments,
0 bytes from memory into reduce
>>> 2010-06-11 15:27:12,799 INFO org.apache.hadoop.mapred.Merger:
>>>  Merging 7 sorted segments
>>> 2010-06-11 15:27:20,891 FATAL org.apache.hadoop.mapred.TaskTracker: Error running
child : java.lang.OutOfMemoryError: Java heap space
>>> 	at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:342)
>>>
>>>
>>> 	at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:404)
>>> 	at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)
>>> 	at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:420)
>>> 	at org.apache.hadoop.mapred.Merger.merge(Merger.java:142)
>>>
>>>
>>> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2298)
>>> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:570)
>>> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
>>>
>>>
>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>
>>>
>>
>
>

Mime
View raw message