Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of hemanty@thoughtworks.com
 designates 64.18.0.24 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAPcpQZj4TtLGrvN9P6z8kTC7FbT_=qYBjvUxT1QuuTt=ZVP7xQ@mail.gmail.com>
References: 
 <CAPcpQZj4TtLGrvN9P6z8kTC7FbT_=qYBjvUxT1QuuTt=ZVP7xQ@mail.gmail.com>
Date: Thu, 21 Feb 2013 07:11:28 +0530
Message-ID: 
 <CAEAKFL-9wiww5CmbfYx22eWg-08epErYPp1r6T9GyqzeM5-7yg@mail.gmail.com>
Subject: Re: OutOfMemoryError during reduce shuffle
From: Hemanth Yamijala <yhemanth@thoughtworks.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=e89a8ff1c2fc6daf4504d6322c56

--e89a8ff1c2fc6daf4504d6322c56
Content-Type: text/plain; charset=ISO-8859-1

There are a few tweaks In configuration that may help. Can you please look
at
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Shuffle%2FReduce+Parameters

Also, since you have mentioned reducers are unbalanced, could you use a
custom partitioner to balance out the outputs. Or just increase the number
of reducers so the load is spread out.

Thanks
Hemanth

On Wednesday, February 20, 2013, Shivaram Lingamneni wrote:

> I'm experiencing the following crash during reduce tasks:
>
> https://gist.github.com/slingamn/04ff3ff3412af23aa50d
>
> on Hadoop 1.0.3 (specifically I'm using Amazon's EMR, AMI version
> 2.2.1). The crash is triggered by especially unbalanced reducer
> inputs, i.e., when one reducer receives too many records. (The reduce
> task gets retried three times, but since the data is the same every
> time, it crashes each time in the same place and the job fails.)
>
> From the following links:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-1182
>
>
> http://hadoop-common.472056.n3.nabble.com/Shuffle-In-Memory-OutOfMemoryError-td433197.html
>
> it seems as though Hadoop is supposed to prevent this from happening
> by intelligently managing the amount of memory that is provided to the
> shuffle. However, I don't know how ironclad this guarantee is.
>
> Can anyone advise me on how robust I can expect Hadoop to be to this
> issue, in the face of highly unbalanced reducer inputs? Thanks very
> much for your time.
>

--e89a8ff1c2fc6daf4504d6322c56
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

There are a few tweaks In configuration that may help. Can you please look =
at=A0<a href=3D"http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#S=
huffle%2FReduce+Parameters">http://hadoop.apache.org/docs/r1.0.4/mapred_tut=
orial.html#Shuffle%2FReduce+Parameters</a><div>
<br></div><div>Also, since you have mentioned reducers are unbalanced, coul=
d you use a custom partitioner to balance out the outputs. Or just increase=
 the number of reducers so the load is spread out.</div><div><br></div>
<div>Thanks</div><div>Hemanth<span></span><br><br>On Wednesday, February 20=
, 2013, Shivaram Lingamneni  wrote:<br><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I&#39=
;m experiencing the following crash during reduce tasks:<br>

<br>
<a href=3D"https://gist.github.com/slingamn/04ff3ff3412af23aa50d" target=3D=
"_blank">https://gist.github.com/slingamn/04ff3ff3412af23aa50d</a><br>
<br>
on Hadoop 1.0.3 (specifically I&#39;m using Amazon&#39;s EMR, AMI version<b=
r>
2.2.1). The crash is triggered by especially unbalanced reducer<br>
inputs, i.e., when one reducer receives too many records. (The reduce<br>
task gets retried three times, but since the data is the same every<br>
time, it crashes each time in the same place and the job fails.)<br>
<br>
>From the following links:<br>
<br>
<a href=3D"https://issues.apache.org/jira/browse/MAPREDUCE-1182" target=3D"=
_blank">https://issues.apache.org/jira/browse/MAPREDUCE-1182</a><br>
<br>
<a href=3D"http://hadoop-common.472056.n3.nabble.com/Shuffle-In-Memory-OutO=
fMemoryError-td433197.html" target=3D"_blank">http://hadoop-common.472056.n=
3.nabble.com/Shuffle-In-Memory-OutOfMemoryError-td433197.html</a><br>
<br>
it seems as though Hadoop is supposed to prevent this from happening<br>
by intelligently managing the amount of memory that is provided to the<br>
shuffle. However, I don&#39;t know how ironclad this guarantee is.<br>
<br>
Can anyone advise me on how robust I can expect Hadoop to be to this<br>
issue, in the face of highly unbalanced reducer inputs? Thanks very<br>
much for your time.<br>
</blockquote></div>

--e89a8ff1c2fc6daf4504d6322c56--