Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 79812 invoked from network); 11 Aug 2010 01:11:38 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Aug 2010 01:11:38 -0000 Received: (qmail 68726 invoked by uid 500); 11 Aug 2010 01:11:38 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 68643 invoked by uid 500); 11 Aug 2010 01:11:37 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 68635 invoked by uid 99); 11 Aug 2010 01:11:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Aug 2010 01:11:37 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [202.43.217.38] (HELO n1.bullet.cnmail.cnb.yahoo.com) (202.43.217.38) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 11 Aug 2010 01:11:28 +0000 Received: from [203.209.230.74] by n1.bullet.cnmail.cnb.yahoo.com with NNFMP; 11 Aug 2010 01:11:04 -0000 Received: from [202.165.102.49] by t4.bullet.cnb.yahoo.com with NNFMP; 11 Aug 2010 01:11:04 -0000 Received: from [127.0.0.1] by omp103.mail.cnb.yahoo.com with NNFMP; 11 Aug 2010 01:11:03 -0000 X-Yahoo-Newman-Property: ymail-5 X-Yahoo-Newman-Id: 988572.1571.bm@omp103.mail.cnb.yahoo.com Received: (qmail 95530 invoked by uid 60001); 11 Aug 2010 01:11:01 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com.cn; s=s1024; t=1281489061; bh=fzoGdRC19Z4gPaeTdgg9iAOAY4UjYk51j26fBy7k3uo=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=IXpUu3jNmLC5KicbmO8fPEofXm59Z2d9WJ/N3q1krzbwquMyJ7hiCI961BIU53TNbafiKzNT+apEZw9fMcChOcd0CkJJup7HC32yMgOhgrw+KaKRZUretwNe/Rm0qzfW/RMT+QciC/9gOE0gN7fBtgufjDci0rUY+jm3LxWKN1Q= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com.cn; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=BDgjp6guA9ONUX9Q3lFyhnTy98wrNDA3YEtBNaSGlRuBmp2OvqdjLIbP/ZN0x7xqCZvAOrWuN21pHEiiLr9YlgBCnUbtMUMC2RO/EpYU8UwQqnO87aCanSF+NEaO5hWnBx88tjsCQoH5/nWi601jEKm+AhbkuLn/AVUdDKnTlcs=; Message-ID: <204485.85921.qm@web15202.mail.cnb.yahoo.com> X-YMail-OSG: RwjTjacVM1mnxnQoQ2XMMd7m4YTPH86wHryF3EEvvXMtgtz Uru6ykHvUsANNhKJag8ynTVZZFqPINCjX01zOuqNJO1cgKW7LZ3r6oT6oqjU fZG3l7XG0N3FMsuNEHJE.BwGLz3c8EvD8sMO_HRD60oYCfSPMIhJkNRASORI viEpUoZnnA7LJaV12U23DN53yAM.MNXUKIsGSa2yCgEPu6tZqvMNfX0lzy40 RYYKrKS03rPeuHv4mBW64A8ixe7yJfbodTUevA8vmGusvnwKC9aLVDCiVfBN kAA4_eL_Xgf8zMQQ- Received: from [202.118.67.200] by web15202.mail.cnb.yahoo.com via HTTP; Wed, 11 Aug 2010 09:11:01 CST X-Mailer: YahooMailClassic/11.3.2 YahooMailWebService/0.8.105.279950 Date: Wed, 11 Aug 2010 09:11:01 +0800 (CST) From: Dennis Subject: Re: RawComparator To: mapreduce-user@hadoop.apache.org In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-1013790942-1281489061=:85921" X-Virus-Checked: Checked by ClamAV on apache.org --0-1013790942-1281489061=:85921 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Thanks, Josh, for your professional answers. The following is my "composite key" class. And in the Q1003InPairComparator= .compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) method, I fi= gured out two ways of picking up original String values and comparing. The = first "using streams" one is using java.io.* package to deserialize the ori= ginal String=0Avalues, and the comparision is easy to do. The second "using= byte[]"=0Aone, I just try to deserialize the byte[] by myself, as it's not= =0Adifficult to do this, and I donnot have to "new" any stream classes in j= ava.io.*. So, 1. What do you say about the two methods? Or any better ones? 2. If I use=0A the first "using streams" one, what do I deal with the IOExc= eption? If an Exception is thrown, what value should I return?(In the follo= wing code, I simply returned -1. I know it's not smart to do so.) =A0=A0=A0 public static class Q1003InPair implements WritableComparable { =A0=A0=A0 =A0=A0=A0 private String dateStr; =A0=A0=A0 =A0=A0=A0 private String str; =A0=A0=A0 =A0=A0=A0 public void set(String dateStr, String str) { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 this.dateStr =3D dateStr; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 this.str =3D str; =A0=A0=A0 =A0=A0=A0 } =A0=A0=A0 =A0=A0=A0 public String getDateStr() { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return=0A this.dateStr; =A0=A0=A0 =A0=A0=A0 } =A0=A0=A0 =A0=A0=A0 public String getStr() { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return this.str; =A0=A0=A0 =A0=A0=A0 } =A0=A0=A0 =A0=A0=A0 @Override =A0=A0=A0 =A0=A0=A0 public void readFields(DataInput in) throws IOException= { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 dateStr =3D in.readUTF(); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 str =3D in.readUTF(); =A0=A0=A0 =A0=A0=A0 } =A0=A0=A0 =A0=A0=A0 @Override =A0=A0=A0 =A0=A0=A0 public void write(DataOutput out) throws IOException { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 out.writeUTF(dateStr); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 out.writeUTF(str); =A0=A0=A0 =A0=A0=A0=0A } =A0=A0=A0 =A0=A0=A0 @Override =A0=A0=A0 =A0=A0=A0 public int hashCode() { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 final int prime =3D 31; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 int result =3D 1; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 result =3D prime * result =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 + ((dateStr =3D=3D null) = ? 0 : dateStr.hashCode()); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 result =3D prime * result + ((str =3D=3D null= ) ? 0 : str.hashCode()); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return result; =A0=A0=A0 =A0=A0=A0 } =A0=A0=A0 =A0=A0=A0 @Override =A0=A0=A0 =A0=A0=A0 public boolean equals(Object obj) { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if (this =3D=3D=0A obj) =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return true; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if (obj =3D=3D null) =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return false; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if (getClass() !=3D obj.getClass()) =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return false; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 final Q1003InPair other =3D (Q1003InPair) obj= ; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if (dateStr =3D=3D null) { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if (other.dateStr !=3D null) =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return false; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 } else if=0A (!dateStr.equals(other.dateStr)) =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return false; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if (str =3D=3D null) { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if (other.str !=3D null) =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return false; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 } else if (!str.equals(other.str)) =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return false; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return true; =A0=A0=A0 =A0=A0=A0 } =A0=A0=A0 =A0=A0=A0 public String toString() { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 StringBuffer sb =3D new StringBuffer(); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0=0A sb.append(this.getDateStr()); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 sb.append(","); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 sb.append(this.getStr()); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return sb.toString(); =A0=A0=A0 =A0=A0=A0 } =A0=A0=A0 =A0=A0=A0 public static class Q1003InPairComparator extends Writa= bleComparator { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 public Q1003InPairComparator() { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 super(Q1003InPair.class); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 } =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 public int compare(byte[] b1, int s1, int l1,= byte[] b2, int s2, =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 int l2) { =A0=A0=A0=0A =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // 1. using streams =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // ByteArrayInputStream bais1 =3D n= ull; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // ByteArrayInputStream bais2 =3D n= ull; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // DataInputStream dis1 =3D null; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // DataInputStream dis2 =3D null; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // try { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // bais1 =3D new ByteArrayInputStre= am(b1); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // dis1 =3D new DataInputStream(bai= s1); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 //=0A dis1.skip(s1); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // String str1 =3D dis1.readUTF(); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // String str3 =3D dis1.readUTF(); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 //=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0=20 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // bais2 =3D new ByteArrayInputStre= am(b2); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // dis2 =3D new DataInputStream(bai= s2); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // dis2.skip(s2); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // String str2 =3D dis2.readUTF(); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // String str4 =3D=0A dis2.readUTF(= ); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 //=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0=20 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // if (str1.equals(str2)) { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // return str3.compareTo(str4); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // } else { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // return str1.compareTo(str2); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // } =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // } catch(Exception e) { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // } finally { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0=0A =A0=A0=A0 // try { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // dis1.close(); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // } catch(Exception e) {} =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // try { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // dis2.close(); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // } catch(Exception e) {} =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // try { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // bais1.close(); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // } catch(Exception e) {} =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // try { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0=0A =A0=A0=A0 // bais2.close(); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // } catch(Exception e) {} =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // } =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // return -1; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 // 2. using byte[] =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 int strLength1 =3D ((int) b1[s1]) *= 256 + ((int) b1[s1 + 1]); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String str1 =3D new String(b1, s1 += 2, strLength1); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 s1 +=3D strLength1 + 2; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 int strLength3 =3D ((int) b1[s1]) *= 256 + ((int) b1[s1 + 1]); =A0=A0=A0=0A =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String str3 =3D new String(b1, s= 1 + 2, strLength3); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 int strLength2 =3D ((int) b2[s2]) *= 256 + ((int) b2[s2 + 1]); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String str2 =3D new String(b2, s2 += 2, strLength2); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 s2 +=3D strLength2 + 2; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 int strLength4 =3D ((int) b2[s2]) *= 256 + ((int) b2[s2 + 1]); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String str4 =3D new String(b2, s2 += 2, strLength4); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if (str1.equals(str2)) { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return=0A str3.compareTo(= str4); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 } else { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return str1.compareTo(str= 2); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 } =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 } =A0=A0=A0 =A0=A0=A0 } =A0=A0=A0 =A0=A0=A0 static { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 WritableComparator.define(Q1003InPair.class, =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 new Q1003InPairComparator= ()); =A0=A0=A0 =A0=A0=A0 } =A0=A0=A0 =A0=A0=A0 @Override =A0=A0=A0 =A0=A0=A0 public int compareTo(Q1003InPair o) { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if=0A (!this.getDateStr().equals(o.getDateStr= ())) { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return this.getDateStr().equals(o.g= etDateStr()) ? 0 : (this =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 .getDateStr().c= ompareTo(o.getDateStr()) > 0 ? 1 : -1); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 } else if (!this.getStr().equals(o.getStr()))= { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return this.getStr().equals(o.getSt= r()) ? 0 : (this.getStr() =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 .compareTo(o.ge= tStr()) > 0 ? 1 : -1); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 } else { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return 0; =A0=A0=A0 =A0=A0=A0=0A =A0=A0=A0 } =A0=A0=A0 =A0=A0=A0 } =A0=A0=A0 } Thanks again. Dennis --- On Wed, 8/11/10, Josh Patterson wrote: From: Josh Patterson Subject: Re: RawComparator To: mapreduce-user@hadoop.apache.org Cc: general@hadoop.apache.org Date: Wednesday, August 11, 2010, 3:21 AM Dennis, On Tue, Aug 10, 2010 at 4:01 AM, Dennis wrote: =0A=0AHi, guys, I am using hadoop 0.20.2, and I am trying to run the "SecondarySort" exmapl= e. The following is the "FirstGroupingComparator" class, and I just cannot = figure out how "WritableComparator.compareBytes(b1, s1, Integer.SIZE / 8, b= 2, s2, Integer.SIZE / 8)" works. There are really few javadocs of this clas= s or=A0 this method. =0A=0A1. Why it is "Integer.SIZE / 8"? That says "take the size of the integer in bits on this system and divide i= t by 8" --- which in java on 32 and 64 bit systems should give you 32 / 8 = =3D=3D 4 as afaik the integer bit width doesnt change based on the architec= ture with java. So its saying here "compare the first 4 bytes of each byte = array" (the width, in bytes, of the first integer in the composite key) ,wh= ereas Integer.SIZE gives the number of bits in the datatype.=0A=0A WritableComparators are useful in the shuffle phase of hadoop; we are const= antly comparing and sorting WritableComparables, and the secondary sorting = mechanics allow us to have a group of data for a key arrive at the reducer = in a certain order (example: time series data, where we want a range of tim= estamps in one group, but we also want them in order when they are processe= d inside the reducer)=0A=0A=A02. If I want to compare two "String" here, ho= w should I write to code? =0A=0A =A0=A0=A0 public static class FirstGroupingComparator implements =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 RawComparator { =A0=A0=A0 =A0=A0=A0 @Override =A0=A0=A0 =A0=A0=A0 public int compare(byte[] b1, int=0A s1, int l1, byte[]= b2, int s2, int l2) { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 int ret =3D WritableComparator.compareBytes(b= 1, s1, Integer.SIZE / 8, =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 b2, s2, Integer.SIZE / 8)= ; =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return ret; =A0=A0=A0 =A0=A0=A0 } =0A=0A=A0=A0=A0 =A0=A0=A0 @Override =A0=A0=A0 =A0=A0=A0 public int compare(IntPair o1, IntPair o2) { =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 int l =3D o1.getFirst(); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 int r =3D o2.getFirst(); =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 return l =3D=3D r ? 0 : (l < r ? -1 : 1); =A0=A0=A0 =A0=A0=A0 } =0A=0A=A0=A0=A0 } In the case of the comparison of strings, lets say for example you have a "= composite key" that has two String or Text object members (k1, k2); We "gro= up by" the first part of the key k1 and we sort by this key as well (we blo= ck ranges together). This is very similar to the example above. Since with = a RawComparator we are looking to only deserialize a portion of the data to= do the comparison, you'll need a strategy for the compare() function that = takes into account that the strings are variable length (which means we are= unable to simply read 4 bytes as in the case of the integer). The challeng= e here is to only deserialize the portion of the composite key that contain= s the string/text that you want to compare against, which is going to be a = variable number of bytes each time. A good place to start looking at for id= eas would be the Text class in Hadoop and also WritableUtils.=0A Josh PattersonCloudera=0A =0A Thanks. Dennis =0A=0A =20 =0A=0A=0A=0A --0-1013790942-1281489061=:85921 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks, Josh, for your professional answers.<= br>
The following is my "composite key" class. And in the Q1003InPairComparator.compare(byte[] b1, int s1, int = l1, byte[] b2, int s2, int l2) method, I figured out two ways of pic= king up original String values a= nd comparing. The first "using streams" one is using java.io.* package to deserialize the original String=0Avalues, and the comparision is = easy to do. The second "using byte[]"=0Aone, I just try to deserialize the = byte[] by myself, as it's not=0Adifficult to do this, and I donnot have to = "new" any stream classes in java.io.*.
So,
1. What do you say about the two methods? Or any better on= es?
2. If I use=0A the first "using streams" one, what do I deal with th= e IOException? If an Exception is thrown, what value should I re= turn?(In the following code, I simply returned -1. I know it's not smart to= do so.)


    public static class Q1003InPair impl= ements WritableComparable<Q1003InPair> {
     =    private String dateStr;
      &nbs= p; private String str;

        public = void set(String dateStr, String str) {
      &n= bsp;     this.dateStr =3D dateStr;
    &nb= sp;       this.str =3D str;
    =     }

        public St= ring getDateStr() {
          &n= bsp; return=0A this.dateStr;
        }
=
        public String getStr() {
 = ;           return this.str;
&nb= sp;       }

      =   @Override
        public void readF= ields(DataInput in) throws IOException {
      =       dateStr =3D in.readUTF();
    &= nbsp;       str =3D in.readUTF();
  &= nbsp;     }

        @Ov= erride
        public void write(DataOutpu= t out) throws IOException {
         =    out.writeUTF(dateStr);
       = ;     out.writeUTF(str);
      &= nbsp;=0A }

        @Override
 =        public int hashCode() {
  &nbs= p;         final int prime =3D 31;
 &= nbsp;          int result =3D 1;
&nbs= p;           result =3D prime * res= ult
              = ;      + ((dateStr =3D=3D null) ? 0 : dateStr.hashCode(= ));
            result =3D = prime * result + ((str =3D=3D null) ? 0 : str.hashCode());
  &= nbsp;         return result;
  &= nbsp;     }

        @Ov= erride
        public boolean equals(Objec= t obj) {
            if (th= is =3D=3D=0A obj)
          &nbs= p;     return true;
       =     if (obj =3D=3D null)
      =           return false;
  &= nbsp;         if (getClass() !=3D obj.getClas= s())
             &nbs= p;  return false;
          = ;  final Q1003InPair other =3D (Q1003InPair) obj;
   = ;         if (dateStr =3D=3D null) {
 = ;               if (= other.dateStr !=3D null)
         &nb= sp;          return false;
 &nbs= p;          } else if=0A (!dateStr.equal= s(other.dateStr))
          &nbs= p;     return false;
       = ;     if (str =3D=3D null) {
     &nb= sp;          if (other.str !=3D null)               = ;     return false;
       =     } else if (!str.equals(other.str))
   = ;             return false;            return true;
&= nbsp;       }

     &nbs= p;  public String toString() {
       = ;     StringBuffer sb =3D new StringBuffer();
  = ;         =0A sb.append(this.getDateStr(= ));
            sb.append("= ,");
            sb.append(= this.getStr());
           = return sb.toString();
        }

&n= bsp;       public static class Q1003InPairComparat= or extends WritableComparator {
        &n= bsp;   public Q1003InPairComparator() {
    &nb= sp;           super(Q1003InPair.cla= ss);
            }

&= nbsp;           public int compare(= byte[] b1, int s1, int l1, byte[] b2, int s2,
     &= nbsp;              int l2= ) {
   =0A          &n= bsp;  // 1. using streams
        &nb= sp;       // ByteArrayInputStream bais1 =3D null;<= br>              &nb= sp; // ByteArrayInputStream bais2 =3D null;
     &nb= sp;          // DataInputStream dis1 =3D= null;
             &n= bsp;  // DataInputStream dis2 =3D null;
     &n= bsp;          // try {
  &n= bsp;             // bais1 =3D = new ByteArrayInputStream(b1);
        &nbs= p;       // dis1 =3D new DataInputStream(bais1);              &nbs= p; //=0A dis1.skip(s1);
         &nbs= p;      // String str1 =3D dis1.readUTF();
 &nb= sp;              // Strin= g str3 =3D dis1.readUTF();
         &= nbsp;      //        &nbs= p;          
   =             // bais2 =3D new = ByteArrayInputStream(b2);
         &n= bsp;      // dis2 =3D new DataInputStream(bais2);
&n= bsp;               /= / dis2.skip(s2);
           = ;     // String str2 =3D dis2.readUTF();
  &nbs= p;             // String str4 = =3D=0A dis2.readUTF();
          = ;      //         &n= bsp;         
    &nb= sp;           // if (str1.equals(st= r2)) {
             &n= bsp;  // return str3.compareTo(str4);
     &nbs= p;          // } else {
  &= nbsp;             // return st= r1.compareTo(str2);
          &n= bsp;     // }
         = ;       // } catch(Exception e) {
  &= nbsp;             // } finally= {
           =0A  &nb= sp;  // try {
          &nb= sp;     // dis1.close();
      &= nbsp;         // } catch(Exception e) {}
&= nbsp;               = // try {
             =    // dis2.close();
        &nbs= p;       // } catch(Exception e) {}
  = ;              // try {              &nbs= p; // bais1.close();
          &= nbsp;     // } catch(Exception e) {}
    &= nbsp;           // try {
 &= nbsp;         =0A     // = bais2.close();
            =     // } catch(Exception e) {}
     &= nbsp;          // }
   = ;             // return -1;
              &= nbsp; // 2. using byte[]
         &nb= sp;      int strLength1 =3D ((int) b1[s1]) * 256 + ((in= t) b1[s1 + 1]);
           =     String str1 =3D new String(b1, s1 + 2, strLength1);
=                = s1 +=3D strLength1 + 2;
         &nb= sp;      int strLength3 =3D ((int) b1[s1]) * 256 + ((in= t) b1[s1 + 1]);
   =0A       &nb= sp;     String str3 =3D new String(b1, s1 + 2, strLength3);<= br>
              = ;  int strLength2 =3D ((int) b2[s2]) * 256 + ((int) b2[s2 + 1]);
&n= bsp;               S= tring str2 =3D new String(b2, s2 + 2, strLength2);
    &n= bsp;           s2 +=3D strLength2 += 2;
              = ;  int strLength4 =3D ((int) b2[s2]) * 256 + ((int) b2[s2 + 1]);
&n= bsp;               S= tring str4 =3D new String(b2, s2 + 2, strLength4);

   = ;             if (str1.equals(= str2)) {
             =        return=0A str3.compareTo(str4);
 &n= bsp;              } else = {
              &= nbsp;     return str1.compareTo(str2);
   =             }
  =           }
     =    }

        static {
&nb= sp;           WritableComparator.de= fine(Q1003InPair.class,
         &nbs= p;          new Q1003InPairComparator())= ;
        }

    &nbs= p;   @Override
        public in= t compareTo(Q1003InPair o) {
         = ;   if=0A (!this.getDateStr().equals(o.getDateStr())) {
 =                retur= n this.getDateStr().equals(o.getDateStr()) ? 0 : (this
   = ;               &nbs= p;     .getDateStr().compareTo(o.getDateStr()) > 0 ? 1 : = -1);
            } else if = (!this.getStr().equals(o.getStr())) {
      &nb= sp;         return this.getStr().equals(o.get= Str()) ? 0 : (this.getStr()
         =                .comp= areTo(o.getStr()) > 0 ? 1 : -1);
       = ;     } else {
        &nbs= p;       return 0;
     &nb= sp; =0A     }
        = }
    }

Thanks again.
Dennis


--- On = Wed, 8/11/10, Josh Patterson <josh@cloudera.com> wrote:=

From: Josh Patterson <josh@cloudera.com= >
Subject: Re: RawComparator
To: mapreduce-user@hadoop.apache.org<= br>Cc: general@hadoop.apache.org
Date: Wednesday, August 11, 2010, 3:21 = AM

Dennis,

On Tue, Aug 10, 2010 at 4:01 AM, Dennis &= lt;arsenepark@yahoo= .com.cn> wrote:
=0A=0A
Hi, guys,

I am using hadoop 0.20.2, and I am trying to ru= n the "SecondarySort" exmaple. T= he following is the "FirstGroupingComparator" class, and I just cannot figu= re out how "WritableComparator.compareB= ytes(b1, s1, Integer.SIZE / 8, b2, s2, Integer.SIZE / 8)" works. The= re are really few javadocs of this class or  this method.
=0A=0A1. = Why it is "Integer.SIZE / 8"?

That says "take= the size of the integer in bits on this system and divide it by 8" --- whi= ch in java on 32 and 64 bit systems should give you 32 / 8 =3D=3D 4 as afai= k the integer bit width doesnt change based on the architecture with java. = So its saying here "compare the first 4 bytes of each byte array" (the widt= h, in bytes, of the first integer in the composite key) ,whereas Integer.SI= ZE gives the number of bits in the datatype.
=0A=0A

WritableComparators are useful in the shuffle phase of hadoop; we are cons= tantly comparing and sorting WritableComparables, and the secondary sorting= mechanics allow us to have a group of data for a key arrive at the reducer= in a certain order (example: time series data, where we want a range of ti= mestamps in one group, but we also want them in order when they are process= ed inside the reducer)
=0A=0A
 
=
2. If I want to compare two "S= tring" here, how should I write to code?
=0A=0A
    pu= blic static class FirstGroupingComparator implements
    =         RawComparator<IntPair> {
&nb= sp;       @Override
     &n= bsp;  public int compare(byte[] b1, int=0A s1, int l1, byte[] b2, int = s2, int l2) {
            i= nt ret =3D WritableComparator.compareBytes(b1, s1, Integer.SIZE / 8,
&nb= sp;               &n= bsp;   b2, s2, Integer.SIZE / 8);
     &nb= sp;      return ret;
      =   }

=0A=0A        @Override
&n= bsp;       public int compare(IntPair o1, IntPair = o2) {
            int l =3D= o1.getFirst();
           = int r =3D o2.getFirst();
         &n= bsp;  return l =3D=3D r ? 0 : (l < r ? -1 : 1);
  &nbs= p;     }
=0A=0A    }

In the case of the comparison of s= trings, lets say for example you have a "composite key" that has two String= or Text object members (k1, k2); We "group by" the first part of the key k= 1 and we sort by this key as well (we block ranges together). This is very = similar to the example above. Since with a RawComparator we are looking to = only deserialize a portion of the data to do the comparison, you'll need a = strategy for the compare() function that takes into account that the string= s are variable length (which means we are unable to simply read 4 bytes as = in the case of the integer). The challenge here is to only deserialize the = portion of the composite key that contains the string/text that you want to= compare against, which is going to be a variable number of bytes each time= . A good place to start looking at for ideas would be the Text class in Had= oop and also WritableUtils.
=0A

Josh Patterson
Cloude= ra
=0A


=0A

Thanks.
Dennis

=0A=0A

=0A
=

=0A=0A=0A=0A=0A=0A=0A=0A --0-1013790942-1281489061=:85921--