hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahesh Balija <balijamahesh....@gmail.com>
Subject Re: WordPairCount Mapreduce question.
Date Mon, 25 Feb 2013 08:14:47 GMT
byte array comparison is for performance reasons only, but NOT the way you
are thinking.
This method comes from an interface called RawComparator which provides the
prototype (public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
int l2);) for this method.
In the sorting phase where the keys are sorted, because of this
implementation the records are read from the stream directly and sorted
without the need to deserializing them into Objects.

Best,
Mahesh Balija,
CalsoftLabs.

On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai <saigraph@yahoo.in> wrote:

> Thanks Mahesh for your help.
>
> Wondering if u can provide some insight with the below compare method
> using byte[] in the SecondarySort example:
>
> public static class Comparator extends WritableComparator {
>         public Comparator() {
>             super(URICountKey.class);
>         }
>
>         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
> int l2) {
>             return compareBytes(b1, s1, l1, b2, s2, l2);
>         }
>     }
>
> My question is in the below compare method that i have given we are
> comparing word1/word2
> which makes sense but what about this byte[] comparison, is it right in
> assuming  it converts each objects word1/word2/word3 to byte[] and compares
> them.
> If so is it for performance reason it is done.
> Could you please verify.
> Thanks
> Sai
>   ------------------------------
> *From:* Mahesh Balija <balijamahesh.mca@gmail.com>
> *To:* user@hadoop.apache.org; Sai Sai <saigraph@yahoo.in>
> *Sent:* Saturday, 23 February 2013 5:23 AM
> *Subject:* Re: WordPairCount Mapreduce question.
>
> Please check the in-line answers...
>
> On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <saigraph@yahoo.in> wrote:
>
>
> Hello
>
> I have a question about how Mapreduce sorting works internally with
> multiple columns.
>
> Below r my classes using 2 columns in an input file given below.
>
> 1st question: About the method hashCode, we r adding a "31 + ", i am
> wondering why is this required. what does 31 refer to.
>
> This is how usually hashcode is calculated for any String instance
> (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
> the String. Since in your case you only have 2 chars then it will be a *
> 31^0 + b * 31^1.
>
>
>
> 2nd question: what if my input file has 3 columns instead of 2 how would
> you write a compare method and was wondering if anyone can map this to a
> real world scenario it will be really helpful.
>
> you will extend the same approach for the third column,
>  public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>             if(diff==0){
>                  diff = word3.compareTo(o.word3);
>             }
>          }
>         return diff;
>     }
>
>
>
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
> ******************************
>
> Here is my input file wordpair.txt
>
> ******************************
>
> a    b
> a    c
> a    b
> a    d
> b    d
> e    f
> b    d
> e    f
> b    d
>
> **********************************
>
> Here is my WordPairObject:
>
> *********************************
>
> public class WordPairCountKey implements
> WritableComparable<WordPairCountKey> {
>
>     private String word1;
>     private String word2;
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
>
>     public String getWord1() {
>         return word1;
>     }
>
>     public void setWord1(String word1) {
>         this.word1 = word1;
>     }
>
>     public String getWord2() {
>         return word2;
>     }
>
>     public void setWord2(String word2) {
>         this.word2 = word2;
>     }
>
>     @Override
>     public void readFields(DataInput in) throws IOException {
>         word1 = in.readUTF();
>         word2 = in.readUTF();
>     }
>
>     @Override
>     public void write(DataOutput out) throws IOException {
>         out.writeUTF(word1);
>         out.writeUTF(word2);
>     }
>
>
>     @Override
>     public String toString() {
>         return "[word1=" + word1 + ", word2=" + word2 + "]";
>     }
>
> }
>
> ******************************
>
> Any help will be really appreciated.
> Thanks
> Sai
>
>
>
>
>

Mime
View raw message