Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 48B537A58 for ; Wed, 7 Dec 2011 18:51:47 +0000 (UTC) Received: (qmail 7864 invoked by uid 500); 7 Dec 2011 18:51:43 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 7819 invoked by uid 500); 7 Dec 2011 18:51:43 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 7811 invoked by uid 99); 7 Dec 2011 18:51:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Dec 2011 18:51:43 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 209.85.220.176 is neither permitted nor denied by domain of niranjan@cs.washington.edu) Received: from [209.85.220.176] (HELO mail-vx0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Dec 2011 18:51:35 +0000 Received: by vcbfl13 with SMTP id fl13so990391vcb.35 for ; Wed, 07 Dec 2011 10:51:15 -0800 (PST) Received: by 10.220.231.4 with SMTP id jo4mr2438563vcb.17.1323283874933; Wed, 07 Dec 2011 10:51:14 -0800 (PST) Received: from jeeves-1.dyn.cs.washington.edu (jeeves-1.dyn.cs.washington.edu. [128.208.7.137]) by mx.google.com with ESMTPS id m7sm2100245vdi.16.2011.12.07.10.51.13 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 07 Dec 2011 10:51:14 -0800 (PST) From: Niranjan Balasubramanian Content-Type: multipart/alternative; boundary="Apple-Mail=_5935EF32-8418-4F20-BE3A-3099EC50C828" Subject: OOM Error Map output copy. Date: Wed, 7 Dec 2011 10:51:12 -0800 Message-Id: <349A4555-E892-4A67-946D-D07C6732E960@cs.washington.edu> To: common-user@hadoop.apache.org Mime-Version: 1.0 (Apple Message framework v1244.3) X-Mailer: Apple Mail (2.1244.3) --Apple-Mail=_5935EF32-8418-4F20-BE3A-3099EC50C828 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii All=20 I am encountering the following out-of-memory error during the reduce = phase of a large job. Map output copy failure : java.lang.OutOfMemoryError: Java heap space at = org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleIn= Memory(ReduceTask.java:1669) at = org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOut= put(ReduceTask.java:1529) at = org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutpu= t(ReduceTask.java:1378) at = org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(Reduc= eTask.java:1310) I tried increasing the memory available using mapped.child.java.opts but = that only helps a little. The reduce task eventually fails again. Here = are some relevant job configuration details: 1. The input to the mappers is about 2.5 TB (LZO compressed). The = mappers filter out a small percentage of the input ( less than 1%). 2. I am currently using 12 reducers and I can't increase this count by = much to ensure availability of reduce slots for other users.=20 3. mapred.child.java.opts --> -Xms512M -Xmx1536M -XX:+UseSerialGC 4. mapred.job.shuffle.input.buffer.percent --> 0.70 5. mapred.job.shuffle.merge.percent --> 0.66 6. mapred.inmem.merge.threshold --> 1000 7. I have nearly 5000 mappers which are supposed to produce LZO = compressed outputs. The logs seem to indicate that the map outputs range = between 0.3G to 0.8GB.=20 Does anything here seem amiss? I'd appreciate any input of what settings = to try. I can try different reduced values for the input buffer percent = and the merge percent. Given that the job runs for about 7-8 hours = before crashing, I would like to make some informed choices if possible. Thanks.=20 ~ Niranjan. --Apple-Mail=_5935EF32-8418-4F20-BE3A-3099EC50C828--