From general-return-1569-apmail-hadoop-general-archive=hadoop.apache.org@hadoop.apache.org Tue May 25 18:22:28 2010 Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 27214 invoked from network); 25 May 2010 18:22:27 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 25 May 2010 18:22:27 -0000 Received: (qmail 2828 invoked by uid 500); 25 May 2010 18:22:26 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 2652 invoked by uid 500); 25 May 2010 18:22:26 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 2642 invoked by uid 99); 25 May 2010 18:22:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 May 2010 18:22:26 +0000 X-ASF-Spam-Status: No, hits=3.3 required=10.0 tests=HTML_MESSAGE,NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.147.107.20] (HELO mrout1-b.corp.re1.yahoo.com) (69.147.107.20) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 May 2010 18:22:19 +0000 Received: from EGL-EX07CAS01.ds.corp.yahoo.com (egl-ex07cas01.eglbp.corp.yahoo.com [203.83.248.208]) by mrout1-b.corp.re1.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id o4PILQZn083703 for ; Tue, 25 May 2010 11:21:27 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:from:to:date:subject:thread-topic:thread-index: message-id:in-reply-to:accept-language:content-language: x-ms-has-attach:x-ms-tnef-correlator:acceptlanguage:content-type:mime-version; b=qiMv9/d1Ydw2qVt/XX+Y5Rfkb4FtdLwt5scg6bDWnSV7DKkhdNTZewUZYL+NeNQD Received: from EGL-EX07VS01.ds.corp.yahoo.com ([203.83.248.205]) by EGL-EX07CAS01.ds.corp.yahoo.com ([203.83.248.215]) with mapi; Tue, 25 May 2010 23:51:26 +0530 From: "Ankur C. Goel" To: "general@hadoop.apache.org" Date: Tue, 25 May 2010 23:51:24 +0530 Subject: Re: Hash Partitioner Thread-Topic: Hash Partitioner Thread-Index: Acr8HKEVbFionPTnTc+wSk5HMY3lnAAGnP0T Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_C82212FC9C8Bgankuryahooinccom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_C82212FC9C8Bgankuryahooinccom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Another thing I would look at critically is to see if there are any bugs in= the write() and readFields() method of my key class as there could hard to= identify serialization / de-serialization issues. -@nkur On 5/25/10 8:39 PM, "Eric Sammer" wrote: On Mon, May 24, 2010 at 6:32 PM, Deepika Khera wrot= e: > Thanks for your response Eric. > > I am using hadoop 0.20.2. > > Here is what the hashCode() implementation looks like (I actually had the= IDE generate it for me) > > Main key (for mapper & reducer): > > public int hashCode() { > int result =3D kVersion; > result =3D 31 * result + (aKey !=3D null ? aKey.hashCode() : 0); > result =3D 31 * result + (gKey !=3D null ? gKey.hashCode() : 0); > result =3D 31 * result + (int) (date ^ (date >>> 32)); > result =3D 31 * result + (ma !=3D null ? ma.hashCode() : 0); > result =3D 31 * result + (cl !=3D null ? cl.hashCode() : 0); > return result; > } > > > aKey : AKey class > > > public int hashCode() { > int result =3D kVersion; > result =3D 31 * result + (v !=3D null ? v.hashCode() : 0); > result =3D 31 * result + (s !=3D null ? s.hashCode() : 0); > result =3D 31 * result + (o !=3D null ? o.hashCode() : 0); > result =3D 31 * result + (l !=3D null ? l.hashCode() : 0); > result =3D 31 * result + (e ? 1 : 0); //boolean > result =3D 31 * result + (li ? 1 : 0); //boolean > result =3D 31 * result + (aut ? 1 : 0); //boolean > return result; > } > Both of these look fine, assuming all the other hashCode()s return the same value every time. > When this happens, I do see the same values for the key. Also I am not us= ing a grouping comparator. So you see two reduce methods getting the same key with the same values? That's extremely odd. If this is the case, there's a bug in Hadoop. Can you find the relevant logs from the reducers where Hadoop fetches the map output? Does it look like its fetching the same output twice? Do the two tasks where you see the duplicates have the same task ID? Can you confirm the reduce tasks are from the same job ID for us? > I was wondering since the call to HashPartitioner.getPartition() is done = from a map task, several of which are running on different machines, is it = possible that they get a different hashcode and hence get different reducer= s assigned even when the key is the same. The hashCode() result should *always* be the same given the same internal state. In other words, it should be consistent and stable. If I have a string new String("hello world") it will always have the exact same hashCode(). If this isn't true, you will get wildly unpredictable results not just with Hadoop but with Java's comparators, collections, etc. -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com --_000_C82212FC9C8Bgankuryahooinccom_--