hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Stewart <robstewar...@googlemail.com>
Subject Re: Slow final few reducers
Date Sat, 11 Dec 2010 11:38:31 GMT
Hi, many thanks for your response.

A few observations:
- I know that for a fact my key distribution is quite radically skewed
(some keys with *many* value, most keys with few).
- I have overlooked the fact that I need a partitioner. I suspect that
this will help dramatically.

I realize that the number of partitions should equal the number of
reducers (e.g. 100).

So if here are my <key>,<values> (where values is a count):
<the cat>,<20>
<the cat sat on the mat>,<1>

and I have 3 reducers, how do I make:
Reducer-1: <the>
Reducer-2: <a>
Reducer-3: <the cat> & <the cat sat on the mat>



On 11 December 2010 11:12, Harsh J <qwertymaniac@gmail.com> wrote:
> Hi,
> Certain reducers may receive a higher share of data than others
> (Depending on your data/key distribution, the partition function,
> etc.). Compare the longer reduce tasks' counters with the quicker
> ones.
> Are you sure that the reducers that take long are definitely the last
> wave, as in with IDs of 180-200 (and not a random bunch of reduce
> tasks taking longer)?
> Also take a look at the logs, and the machines that run these
> particular reducers -- ensure nothing is wrong on them.
> There's nothing specifically written in Hadoop for the "last wave" of
> Reduce tasks to take longer. Each reducer writes to its own file, and
> is completely independent.
> --
> Harsh J
> www.harshj.com

View raw message