hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Baff <Aaron.B...@telescope.tv>
Subject RE: Reduce method called same key twice
Date Wed, 29 Jun 2011 18:07:16 GMT
You probably need to implement a custom comparator that you use as the grouping comparator
that compares the primary key, and then if they are the same compares the int part of the


From: Trevor Adams [mailto:trevoradams@gmail.com]
Sent: Wednesday, June 29, 2011 10:00 AM
To: mapreduce-user@hadoop.apache.org
Subject: Reduce method called same key twice

So I have a custom Key which is used for a join. It contains two fields, a boolean (is primary
key) and an int (key). Hashcode only looks at the key field, so that it gets sent to the same
reducer. Compare places the pkey at the top of the list (if sorted using compare). This works
nicely, except that the reduce method is called with Key: 1 -> a single value, Key: 1 ->
another value etc. One for each value, so instead of bucketing the values to a key (and some
of the keys are identical, in every way) it sends 1 key and 1 value to the reducer at a time.
How do I get it to bucket or why isn't it bucketing?


View raw message