hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuart White <stuart.whi...@gmail.com>
Subject Confused about partitioning and reducers
Date Sat, 27 Jun 2009 15:25:15 GMT
If I call HashPartitioner.getPartition(), passing a key of 4 and a
numPartitions of 5, it returns a partition of 4.  (Which is what I would

However, if I have a mapred job, and in my mapper I emit a record with key
4, I'm configured to use the HashPartitioner, I have 5 Reducers configured,
and I'm using the IdentityReducer, the record with key 4 gets handled by
Reducer #0 (because it gets written out to part-00000).

I would have expected a record with key 4 to be handled by reducer #4 (and
therefore written to part-00004) because the HashPartitioner returns 4 for a
key of 4 and a numPartitions of 5.

Obviously I'm missing something here.  What is the logic for deciding which
partition of records is handled by which reducer instance?

It can't be random, otherwise mapside join wouldn't work.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message