Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 88767E358 for ; Thu, 21 Feb 2013 01:42:00 +0000 (UTC) Received: (qmail 21716 invoked by uid 500); 21 Feb 2013 01:41:55 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 21445 invoked by uid 500); 21 Feb 2013 01:41:55 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 21437 invoked by uid 99); 21 Feb 2013 01:41:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Feb 2013 01:41:55 +0000 X-ASF-Spam-Status: No, hits=1.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hemanty@thoughtworks.com designates 64.18.0.24 as permitted sender) Received: from [64.18.0.24] (HELO exprod5og112.obsmtp.com) (64.18.0.24) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Feb 2013 01:41:50 +0000 Received: from mail-oa0-f70.google.com ([209.85.219.70]) (using TLSv1) by exprod5ob112.postini.com ([64.18.4.12]) with SMTP ID DSNKUSV7SReM2qBzqbKbYJhPZ9xlyUqKpcx1@postini.com; Wed, 20 Feb 2013 17:41:29 PST Received: by mail-oa0-f70.google.com with SMTP id h2so43921630oag.5 for ; Wed, 20 Feb 2013 17:41:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-received:in-reply-to:references:date :message-id:subject:from:to:content-type:x-gm-message-state; bh=SeWmw+0Re6C463QHaTD4FuykRCRunpH6DDwitTwX3ik=; b=pckyUco+ahI7yHRT9hh/eOf65X+4RT6nqBcqmFntfpdMWwqVZGsPK7ye2qq1Zf0Mxl wdllIy9zJZi3/7gl9Ukj/PkY/PHIrE/TZuEQC8kCurq3vHNEGARkLFdJPCFWG/AIOSaP eWxA/XoJNjrJL4q8GcXX5Xuay/jfO+0rx2n/lTsjWS/Kr3/vcLn6g68g+ZEvAIOp+AnS wnWZE5R4iW9eQn1LUqNltE2XjhDPGdvZQBrxlZwR0a9A57qRDQ6/We+St9XlTIQaY9AX 1GnVGzjK187NPN7TgqoPfsNjw4qX/MsDrdg829tGKzZksjw1nVVkm3MVoPoaWNCLegyY f6Wg== X-Received: by 10.60.25.138 with SMTP id c10mr10339919oeg.12.1361410889063; Wed, 20 Feb 2013 17:41:29 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.60.25.138 with SMTP id c10mr10339917oeg.12.1361410888937; Wed, 20 Feb 2013 17:41:28 -0800 (PST) Received: by 10.76.22.45 with HTTP; Wed, 20 Feb 2013 17:41:28 -0800 (PST) In-Reply-To: References: Date: Thu, 21 Feb 2013 07:11:28 +0530 Message-ID: Subject: Re: OutOfMemoryError during reduce shuffle From: Hemanth Yamijala To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=e89a8ff1c2fc6daf4504d6322c56 X-Gm-Message-State: ALoCoQlCfLpp1Gbvv18OyOpJywnOoTNH1fLH7zNdw3eE5Ggsj3P3gybbOpDgCZSXdO7K6gjumEyZfcrk0QJP/Ia+UyB4VGf+2oEREfk9lYCmRecjVAeghFnSnHZ0xoGixU+ZkX5BnbhFsLuhYI7zSq+brJuxB7/PuQ== X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ff1c2fc6daf4504d6322c56 Content-Type: text/plain; charset=ISO-8859-1 There are a few tweaks In configuration that may help. Can you please look at http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Shuffle%2FReduce+Parameters Also, since you have mentioned reducers are unbalanced, could you use a custom partitioner to balance out the outputs. Or just increase the number of reducers so the load is spread out. Thanks Hemanth On Wednesday, February 20, 2013, Shivaram Lingamneni wrote: > I'm experiencing the following crash during reduce tasks: > > https://gist.github.com/slingamn/04ff3ff3412af23aa50d > > on Hadoop 1.0.3 (specifically I'm using Amazon's EMR, AMI version > 2.2.1). The crash is triggered by especially unbalanced reducer > inputs, i.e., when one reducer receives too many records. (The reduce > task gets retried three times, but since the data is the same every > time, it crashes each time in the same place and the job fails.) > > From the following links: > > https://issues.apache.org/jira/browse/MAPREDUCE-1182 > > > http://hadoop-common.472056.n3.nabble.com/Shuffle-In-Memory-OutOfMemoryError-td433197.html > > it seems as though Hadoop is supposed to prevent this from happening > by intelligently managing the amount of memory that is provided to the > shuffle. However, I don't know how ironclad this guarantee is. > > Can anyone advise me on how robust I can expect Hadoop to be to this > issue, in the face of highly unbalanced reducer inputs? Thanks very > much for your time. > --e89a8ff1c2fc6daf4504d6322c56 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable There are a few tweaks In configuration that may help. Can you please look = at=A0http://hadoop.apache.org/docs/r1.0.4/mapred_tut= orial.html#Shuffle%2FReduce+Parameters

Also, since you have mentioned reducers are unbalanced, coul= d you use a custom partitioner to balance out the outputs. Or just increase= the number of reducers so the load is spread out.

Thanks
Hemanth

On Wednesday, February 20= , 2013, Shivaram Lingamneni wrote:
I'= ;m experiencing the following crash during reduce tasks:

https://gist.github.com/slingamn/04ff3ff3412af23aa50d

on Hadoop 1.0.3 (specifically I'm using Amazon's EMR, AMI version 2.2.1). The crash is triggered by especially unbalanced reducer
inputs, i.e., when one reducer receives too many records. (The reduce
task gets retried three times, but since the data is the same every
time, it crashes each time in the same place and the job fails.)

>From the following links:

https://issues.apache.org/jira/browse/MAPREDUCE-1182

http://hadoop-common.472056.n= 3.nabble.com/Shuffle-In-Memory-OutOfMemoryError-td433197.html

it seems as though Hadoop is supposed to prevent this from happening
by intelligently managing the amount of memory that is provided to the
shuffle. However, I don't know how ironclad this guarantee is.

Can anyone advise me on how robust I can expect Hadoop to be to this
issue, in the face of highly unbalanced reducer inputs? Thanks very
much for your time.
--e89a8ff1c2fc6daf4504d6322c56--