hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <yhema...@thoughtworks.com>
Subject Re: OutOfMemoryError during reduce shuffle
Date Thu, 21 Feb 2013 01:41:28 GMT
There are a few tweaks In configuration that may help. Can you please look
at
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Shuffle%2FReduce+Parameters

Also, since you have mentioned reducers are unbalanced, could you use a
custom partitioner to balance out the outputs. Or just increase the number
of reducers so the load is spread out.

Thanks
Hemanth

On Wednesday, February 20, 2013, Shivaram Lingamneni wrote:

> I'm experiencing the following crash during reduce tasks:
>
> https://gist.github.com/slingamn/04ff3ff3412af23aa50d
>
> on Hadoop 1.0.3 (specifically I'm using Amazon's EMR, AMI version
> 2.2.1). The crash is triggered by especially unbalanced reducer
> inputs, i.e., when one reducer receives too many records. (The reduce
> task gets retried three times, but since the data is the same every
> time, it crashes each time in the same place and the job fails.)
>
> From the following links:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-1182
>
>
> http://hadoop-common.472056.n3.nabble.com/Shuffle-In-Memory-OutOfMemoryError-td433197.html
>
> it seems as though Hadoop is supposed to prevent this from happening
> by intelligently managing the amount of memory that is provided to the
> shuffle. However, I don't know how ironclad this guarantee is.
>
> Can anyone advise me on how robust I can expect Hadoop to be to this
> issue, in the face of highly unbalanced reducer inputs? Thanks very
> much for your time.
>

Mime
View raw message