hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel,Wu" <hadoop...@163.com>
Subject Re:Re:Re:Re:Re: one quesiton in the book of "hadoop:definitive guide 2 edition"
Date Fri, 05 Aug 2011 00:50:02 GMT
Hi John,

Another finding, if I remove the loop of values ( remove for (NullWritable iw:values)), then
the result is the MAX temperature for each year.  and the original test I did return the MIN
temperature for each year. The book also mentioned the value if mutable, I think the key might
also be mutable, means as we loop each value in iterable<NullWritable>, the content
of the key object is reset. Since the input is in order, so if we don't do any loop (as in
the new test), the the key got at the end of reduce function is the first record in the group,
which has the max value. If we loop each value in the value list, say loop 100 times, the
context of the key will also change 100 times, and the key got at the end of the reduce function
will be the last key, which has the MIN value. This theory of immutable Key can explain how
to test works.Just need to figure out why each loop in the statement for (NullWritable iw:values)
can change the content of the key. If any one know this, pleas
 e help tell me.

    public void reduce(IntPair key, Iterable<NullWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int count=0;
      /*for (NullWritable iw:values) {
            count++;
            System.out.print(key.getFirst());
            System.out.print(" : ");
            System.out.println(key.getSecond());
       }*/
    //  System.out.println("number of records for this group "+Integer.toString(count));
      System.out.println("-----------------biggest key is--------------------------");
      System.out.print(key.getFirst());
      System.out.print("   -----    ");
      System.out.println(key.getSecond());
      context.write(key, NullWritable.get());
     }
   }


-----------------biggest key is--------------------------
0   -----    97
-----------------biggest key is--------------------------
4   -----    99
-----------------biggest key is--------------------------
8   -----    99
-----------------biggest key is--------------------------
12   -----    97
-----------------biggest key is--------------------------
16   -----    98



At 2011-08-04 20:51:01,"John Armstrong" <john.armstrong@ccri.com> wrote:
>On Thu, 4 Aug 2011 14:07:12 +0800 (CST), "Daniel,Wu" <hadoop_wu@163.com>
>wrote:
>> I am using the new API (released is from cloudera).  We can see from the
>> output, for each call of reduce function, 100 records were processed, 
>but
>> as the reduce is defined as
>> reduce(IntPair key, Iterable<NullWritable> values, Context context),  so
>> key should be fixed (not change) during every single execution, but the
>> strange thing is that for each loop of Iterable<NullWritable> values, 
>the
>> key is different!!!!!!.  Using your explanation,  the same information
>> (0:97)should be repeated 100 times, but actually it is 0:97, 0:97,
>0:96...
>> 0:0 as below
>
>Ah, but they're NOT different! That's the whole point!
>
>Think carefully: how does Hadoop decide what keys are "the same" when
>sorting and grouping reducer inputs?  It uses a comparator.  If the
>comparator says compare(key1,key2)==0, then as far as Hadoop is concerned
>the keys are the same.
>
>So here the comparator only really checks the first int in the pair:
>
>"compare(0:97,0:96)?  well let's compare 0 and 0...
>Integer.compare(0,0)==0, so these are the same key."
>
>You have to be careful about the semantics of "equality" whenever you're
>using nonstandard comparators.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message