hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Kinney <william.kin...@gmail.com>
Subject Re: WritableComparable and the case of duplicate keys in the reducer
Date Tue, 10 Jan 2012 22:51:54 GMT
I have noticed this too with one job. Keys that are equal (.equals(),
hashCode() === and compareTo === 0) are being sent to multiple reduce tasks
therefore resulting in incorrect output.

Any insight?


On Sat, Aug 13, 2011 at 11:14 AM, Stan Rosenberg <
srosenberg@proclivitysystems.com> wrote:

> Hi All,
>
> Here is what's happening.  I have implemented my own WritableComparable
> keys
> and values.
> Inside a reducer I am seeing 'reduce'  being invoked with the "same" key
> _twice_.
> I have checked that context.getKeyComparator() and
> context.getSortComparator() are both WritableComparator which
> indicates that 'compareTo' method of my key should be called when doing
> reduce-side merge.
>
> Indeed, inside the 'reduce' method I captured both key instances and did
> the
> following checks:
>
> ((WritableComparator)context.getKeyComparator()).compare((Object)key1,
> (Object)key2)
> ((WritableComparator)context.getSortComparator()).compare((Object)key2,
> (Object)key2)
>
> In both calls, the result is '0', confirming that key1 and key2 are
> equivalent.
>
> So, what is going on?
>
> Note that key1 and key2 come from different mappers but they should have
> been collapsed in the reducer since
> they are both equal according to WritableComparator.  Also note that key1
> and key2 are not bitwise equivalent, but
> that shouldn't matter, or should it?
>
> Many thanks in advance!
>
> stan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message