hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Kinney <william.kin...@gmail.com>
Subject Re: WritableComparable and the case of duplicate keys in the reducer
Date Tue, 10 Jan 2012 23:15:28 GMT
Naturally after I send that email I find that I am wrong. I was also using
an enum field, which was the culprit.

On Tue, Jan 10, 2012 at 6:13 PM, William Kinney <william.kinney@gmail.com>wrote:

> I'm (unfortunately) aware of this and this isn't the issue. My key object
> contains only long, int and String values.
> The job map output is consistent, but the reduce input groups and values
> for the key vary from one job to the next on the same input. It's like it
> isn't properly comparing and partitioning the keys.
> I have properly implemented a hashCode(), equals() and the
> WritableComparable methods.
> Also not surprisingly when I use 1 reduce task, the output is correct.
> On Tue, Jan 10, 2012 at 5:58 PM, W.P. McNeill <billmcn@gmail.com> wrote:
>> The Hadoop framework reuses Writable objects for key and value arguments,
>> so if your code stores a pointer to that object instead of copying it you
>> can find yourself with mysterious duplicate objects.  This has tripped me
>> up a number of times. Details on what exactly I encountered and how I
>> fixed
>> it are here
>> http://cornercases.wordpress.com/2011/03/14/serializing-complex-mapreduce-keys/
>> and
>> here
>> http://cornercases.wordpress.com/2011/08/18/hadoop-object-reuse-pitfall-all-my-reducer-values-are-the-same/
>>  .

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message