Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8E8B76E5F for ; Fri, 5 Aug 2011 00:50:39 +0000 (UTC) Received: (qmail 39850 invoked by uid 500); 5 Aug 2011 00:50:36 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 39788 invoked by uid 500); 5 Aug 2011 00:50:35 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 39780 invoked by uid 99); 5 Aug 2011 00:50:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Aug 2011 00:50:35 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of hadoop_wu@163.com designates 220.181.13.38 as permitted sender) Received: from [220.181.13.38] (HELO m13-38.163.com) (220.181.13.38) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Aug 2011 00:50:27 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Received:Date:From:To:Message-ID:In-Reply-To: References:Subject:MIME-Version:Content-Type; bh=jQRgeIv4Qy8Yvus r27mj+8miuMvZBWUm9d6Uln+Jz1Q=; b=Tuxsfk3eHZHchlFJ0XSTvTYetljR8nj pjk6CxuZRq51QGt8V5d1AsedyXPZNf9Ky+/VoJFEUo71/KxWGcuqdmYBzkGbKjQB 39rNsU4pMlMt1/L79u7VEgGCGyTiX2UBQ/ihfXp+/7GjpkN2tor10pav61/HcmOe zbWFEKDxxeD4= Received: from hadoop_wu ( [180.168.100.122] ) by ajax-webmail-wmsvr38 (Coremail) ; Fri, 5 Aug 2011 08:50:02 +0800 (CST) Date: Fri, 5 Aug 2011 08:50:02 +0800 (CST) From: "Daniel,Wu" To: common-user@hadoop.apache.org Message-ID: <5e7ee327.1269.131976b1622.Coremail.hadoop_wu@163.com> In-Reply-To: <0cb6599f2283c361d6d28d07ec733aa6@adam.ccri.com> References: <0cb6599f2283c361d6d28d07ec733aa6@adam.ccri.com> <4990d288da530e1cd41ca5605f02824a@adam.ccri.com> <8b71c11db757d521c4d24354982e8e3c@adam.ccri.com> <75466915.1e324.1318aabeb4b.Coremail.hadoop_wu@163.com> <441f7872.1e632.1318ac17fbc.Coremail.hadoop_wu@163.com> <71d43b79.3abb.1318d7f3d58.Coremail.hadoop_wu@163.com> <4ed04eda.15016.131936717ab.Coremail.hadoop_wu@163.com> Subject: Re:Re:Re:Re:Re: one quesiton in the book of "hadoop:definitive guide 2 edition" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_18183_373592748.1312505402914" X-Originating-IP: [180.168.100.122] X-Priority: 3 X-Mailer: Coremail Webmail Server Version SP_ntes V3.5 build 110713(13936.3901.3890) Copyright (c) 2002-2011 www.mailtech.cn 163com X-CM-CTRLDATA: GR6dWWZvb3Rlcl9odG09NTQzODo4MQ== X-CM-TRANSID: JsGowKA7chg7PjtOfLImAA--.4327W X-CM-SenderInfo: 5kdg00psbz3qqrwthudrp/xtbB0BWS6UtUmvRz4wAAsf X-Coremail-Antispam: 1U5529EdanIXcx71UUUUU7vcSsGvfC2KfnxnUU== X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_18183_373592748.1312505402914 Content-Type: text/plain; charset=GBK Content-Transfer-Encoding: 7bit Hi John, Another finding, if I remove the loop of values ( remove for (NullWritable iw:values)), then the result is the MAX temperature for each year. and the original test I did return the MIN temperature for each year. The book also mentioned the value if mutable, I think the key might also be mutable, means as we loop each value in iterable, the content of the key object is reset. Since the input is in order, so if we don't do any loop (as in the new test), the the key got at the end of reduce function is the first record in the group, which has the max value. If we loop each value in the value list, say loop 100 times, the context of the key will also change 100 times, and the key got at the end of the reduce function will be the last key, which has the MIN value. This theory of immutable Key can explain how to test works.Just need to figure out why each loop in the statement for (NullWritable iw:values) can change the content of the key. If any one know this, pleas e help tell me. public void reduce(IntPair key, Iterable values, Context context ) throws IOException, InterruptedException { int count=0; /*for (NullWritable iw:values) { count++; System.out.print(key.getFirst()); System.out.print(" : "); System.out.println(key.getSecond()); }*/ // System.out.println("number of records for this group "+Integer.toString(count)); System.out.println("-----------------biggest key is--------------------------"); System.out.print(key.getFirst()); System.out.print(" ----- "); System.out.println(key.getSecond()); context.write(key, NullWritable.get()); } } -----------------biggest key is-------------------------- 0 ----- 97 -----------------biggest key is-------------------------- 4 ----- 99 -----------------biggest key is-------------------------- 8 ----- 99 -----------------biggest key is-------------------------- 12 ----- 97 -----------------biggest key is-------------------------- 16 ----- 98 At 2011-08-04 20:51:01,"John Armstrong" wrote: >On Thu, 4 Aug 2011 14:07:12 +0800 (CST), "Daniel,Wu" >wrote: >> I am using the new API (released is from cloudera). We can see from the >> output, for each call of reduce function, 100 records were processed, >but >> as the reduce is defined as >> reduce(IntPair key, Iterable values, Context context), so >> key should be fixed (not change) during every single execution, but the >> strange thing is that for each loop of Iterable values, >the >> key is different!!!!!!. Using your explanation, the same information >> (0:97)should be repeated 100 times, but actually it is 0:97, 0:97, >0:96... >> 0:0 as below > >Ah, but they're NOT different! That's the whole point! > >Think carefully: how does Hadoop decide what keys are "the same" when >sorting and grouping reducer inputs? It uses a comparator. If the >comparator says compare(key1,key2)==0, then as far as Hadoop is concerned >the keys are the same. > >So here the comparator only really checks the first int in the pair: > >"compare(0:97,0:96)? well let's compare 0 and 0... >Integer.compare(0,0)==0, so these are the same key." > >You have to be careful about the semantics of "equality" whenever you're >using nonstandard comparators. ------=_Part_18183_373592748.1312505402914--