hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <kwi...@keithwiley.com>
Subject Force job to use all reducers evenly?
Date Fri, 25 Mar 2011 07:18:23 GMT
Say my mappers produce at most (or precisely) 4 output keys.  Say I designate the job to have
at least (or precisely) 4 reducers.  I have noticed that it is not guaranteed that all four
reducers will be used, one per key.  Rather, it is entirely likely that one reducer won't
be used at all and another will receive two sets of keys, first receiving all values of one
key, then all values of the other key.

This has horrible implications in terms of parallel performance of course.  It effectively
doubles the theoretically optimal reduce-phase time.

I have been told that the only way to achieve a more ideal distribution of work is to write
my own partitioner.  I'm willing to do that, we've done it before within our group on this
project, but I don't want to do any unnecessary work.  I'm mildly surprised that there isn't
a configuration setting that will achieve my desired goal here.  Was the advice I received
correct?  Can my goal only be achieved by writing a fresh partitioner from scratch?


Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"I do not feel obliged to believe that the same God who has endowed us with
sense, reason, and intellect has intended us to forgo their use."
                                           --  Galileo Galilei

View raw message