Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of devaraj.k@huawei.com designates
 119.145.14.64 as permitted sender)
Date: Thu, 08 Dec 2011 13:40:44 +0530
From: Devaraj K <devaraj.k@huawei.com>
Subject: RE: OOM Error Map output copy.
In-reply-to: <349A4555-E892-4A67-946D-D07C6732E960@cs.washington.edu>
To: common-user@hadoop.apache.org
Reply-to: devaraj.k@huawei.com
Message-id: <01076FA1E1ED423C94B90F46AB086928@china.huawei.com>
Organization: Htipl
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
Thread-index: Acy1EYGVadHaChfARBm7RdBw+MKFvQAbsCUg

Hi Niranjan,

	Every thing looks ok as per the info you have given. Can you check
in the job.xml file whether these child opts reflecting or any thing else is
overwriting this config.
	
3. mapred.child.java.opts --> -Xms512M -Xmx1536M -XX:+UseSerialGC


and also can you tell me which version of hadoop using?


Devaraj K 

-----Original Message-----
From: Niranjan Balasubramanian [mailto:niranjan@cs.washington.edu] 
Sent: Thursday, December 08, 2011 12:21 AM
To: common-user@hadoop.apache.org
Subject: OOM Error Map output copy.

All 

I am encountering the following out-of-memory error during the reduce phase
of a large job.

Map output copy failure : java.lang.OutOfMemoryError: Java heap space
	at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMe
mory(ReduceTask.java:1669)
	at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutpu
t(ReduceTask.java:1529)
	at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(
ReduceTask.java:1378)
	at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceT
ask.java:1310)
I tried increasing the memory available using mapped.child.java.opts but
that only helps a little. The reduce task eventually fails again. Here are
some relevant job configuration details:

1. The input to the mappers is about 2.5 TB (LZO compressed). The mappers
filter out a small percentage of the input ( less than 1%).

2. I am currently using 12 reducers and I can't increase this count by much
to ensure availability of reduce slots for other users. 

3. mapred.child.java.opts --> -Xms512M -Xmx1536M -XX:+UseSerialGC

4. mapred.job.shuffle.input.buffer.percent	--> 0.70

5. mapred.job.shuffle.merge.percent	--> 0.66

6. mapred.inmem.merge.threshold	--> 1000

7. I have nearly 5000 mappers which are supposed to produce LZO compressed
outputs. The logs seem to indicate that the map outputs range between 0.3G
to 0.8GB. 

Does anything here seem amiss? I'd appreciate any input of what settings to
try. I can try different reduced values for the input buffer percent and the
merge percent.  Given that the job runs for about 7-8 hours before crashing,
I would like to make some informed choices if possible.

Thanks. 
~ Niranjan.