hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trevor Adams <trevorad...@gmail.com>
Subject Re: Reduce method called same key twice
Date Wed, 29 Jun 2011 18:34:24 GMT
So, that kind of makes sense but why would it not group the other values
then? There are a bunch of the exact same key (only 1 primary record, so
only 1 that is different per set) and it is my understanding that they would
be grouped together (without the primary key) if I didn't do anything
different.

-Trevor

On Wed, Jun 29, 2011 at 2:07 PM, Aaron Baff <Aaron.Baff@telescope.tv> wrote:

> You probably need to implement a custom comparator that you use as the
> grouping comparator that compares the primary key, and then if they are the
> same compares the int part of the key.
>
> --Aaron
>
>
> -----------------------------------------------------------------------------
> From: Trevor Adams [mailto:trevoradams@gmail.com]
> Sent: Wednesday, June 29, 2011 10:00 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: Reduce method called same key twice
>
> So I have a custom Key which is used for a join. It contains two fields, a
> boolean (is primary key) and an int (key). Hashcode only looks at the key
> field, so that it gets sent to the same reducer. Compare places the pkey at
> the top of the list (if sorted using compare). This works nicely, except
> that the reduce method is called with Key: 1 -> a single value, Key: 1 ->
> another value etc. One for each value, so instead of bucketing the values to
> a key (and some of the keys are identical, in every way) it sends 1 key and
> 1 value to the reducer at a time. How do I get it to bucket or why isn't it
> bucketing?
>
> -Trevor
>

Mime
View raw message