Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 00B7BB073 for ; Tue, 10 Jan 2012 23:16:00 +0000 (UTC) Received: (qmail 17001 invoked by uid 500); 10 Jan 2012 23:15:56 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 16928 invoked by uid 500); 10 Jan 2012 23:15:55 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 16920 invoked by uid 99); 10 Jan 2012 23:15:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jan 2012 23:15:55 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of william.kinney@gmail.com designates 74.125.82.176 as permitted sender) Received: from [74.125.82.176] (HELO mail-we0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jan 2012 23:15:50 +0000 Received: by werm10 with SMTP id m10so146959wer.35 for ; Tue, 10 Jan 2012 15:15:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=M8A/2kPlbnlsujkud0ZUSSSqT50D0zV+MnGynscBNXk=; b=oGW9lzpO1gI09JqLfCoa0IZJQ9jXCA5dgP/HUpUwj32ZDkzPsOtuSBVweCJdFZMkpK qyL9qtkfGslUQCdHkatnXdz14Z1IV728pB+VwYo6p/o6S8it+It8UNDb3d3GDVBj7lFi Q/JbcHzdzaoEKZrEsVhkaiEX7Hs52IBwgoSJ8= MIME-Version: 1.0 Received: by 10.216.135.69 with SMTP id t47mr1668412wei.42.1326237328700; Tue, 10 Jan 2012 15:15:28 -0800 (PST) Received: by 10.216.67.85 with HTTP; Tue, 10 Jan 2012 15:15:28 -0800 (PST) In-Reply-To: References: Date: Tue, 10 Jan 2012 18:15:28 -0500 Message-ID: Subject: Re: WritableComparable and the case of duplicate keys in the reducer From: William Kinney To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e6de16b1dd768904b634b021 --0016e6de16b1dd768904b634b021 Content-Type: text/plain; charset=ISO-8859-1 Naturally after I send that email I find that I am wrong. I was also using an enum field, which was the culprit. On Tue, Jan 10, 2012 at 6:13 PM, William Kinney wrote: > I'm (unfortunately) aware of this and this isn't the issue. My key object > contains only long, int and String values. > > The job map output is consistent, but the reduce input groups and values > for the key vary from one job to the next on the same input. It's like it > isn't properly comparing and partitioning the keys. > > I have properly implemented a hashCode(), equals() and the > WritableComparable methods. > > Also not surprisingly when I use 1 reduce task, the output is correct. > > > On Tue, Jan 10, 2012 at 5:58 PM, W.P. McNeill wrote: > >> The Hadoop framework reuses Writable objects for key and value arguments, >> so if your code stores a pointer to that object instead of copying it you >> can find yourself with mysterious duplicate objects. This has tripped me >> up a number of times. Details on what exactly I encountered and how I >> fixed >> it are here >> >> http://cornercases.wordpress.com/2011/03/14/serializing-complex-mapreduce-keys/ >> and >> here >> >> http://cornercases.wordpress.com/2011/08/18/hadoop-object-reuse-pitfall-all-my-reducer-values-are-the-same/ >> . >> > > --0016e6de16b1dd768904b634b021--