Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2DC53702A for ; Wed, 3 Aug 2011 03:42:19 +0000 (UTC) Received: (qmail 99848 invoked by uid 500); 3 Aug 2011 03:42:14 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 98252 invoked by uid 500); 3 Aug 2011 03:42:01 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 98230 invoked by uid 99); 3 Aug 2011 03:41:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Aug 2011 03:41:58 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hadoop_wu@163.com designates 220.181.13.24 as permitted sender) Received: from [220.181.13.24] (HELO m13-24.163.com) (220.181.13.24) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Aug 2011 03:41:50 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Received:Date:From:To:Message-ID:In-Reply-To: References:Subject:MIME-Version:Content-Type; bh=03hwcUWSVmSpZZS BNr+hmkvS+9HfGHZcJv5/0qxIxY4=; b=SoTajc2rJfRyErlSx/z9RgW1kgAHcLQ bxrW5JwlQdL5VKXxW3Z+yjxwGeUPmvgFtCb7D57U3oDnw9ePWHHOelB4milO8Eod 2TVW8kz+lAP3aols5F4EARpZVcAi11mPfo9kieYkQxm1A0jsgfUt+hOduhstVa4/ Ob4kRjR+jqBw= Received: from hadoop_wu ( [180.168.100.122] ) by ajax-webmail-wmsvr24 (Coremail) ; Wed, 3 Aug 2011 11:41:23 +0800 (CST) Date: Wed, 3 Aug 2011 11:41:23 +0800 (CST) From: "Daniel,Wu" To: common-user@hadoop.apache.org Message-ID: <12c0c12e.53e2.1318dbb3b3b.Coremail.hadoop_wu@163.com> In-Reply-To: <71d43b79.3abb.1318d7f3d58.Coremail.hadoop_wu@163.com> References: <71d43b79.3abb.1318d7f3d58.Coremail.hadoop_wu@163.com> <4990d288da530e1cd41ca5605f02824a@adam.ccri.com> <8b71c11db757d521c4d24354982e8e3c@adam.ccri.com> <75466915.1e324.1318aabeb4b.Coremail.hadoop_wu@163.com> <441f7872.1e632.1318ac17fbc.Coremail.hadoop_wu@163.com> Subject: Re:Re:Re:Re: one quesiton in the book of "hadoop:definitive guide 2 edition" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_66153_753960558.1312342883131" X-Originating-IP: [180.168.100.122] X-Priority: 3 X-Mailer: Coremail Webmail Server Version SP_ntes V3.5 build 110713(13936.3901.3890) Copyright (c) 2002-2011 www.mailtech.cn 163com X-CM-CTRLDATA: paUUH2Zvb3Rlcl9odG09MjMxMDo4MQ== X-CM-TRANSID: GMGowKCLTDpjwzhOF_8jAA--.1436W X-CM-SenderInfo: 5kdg00psbz3qqrwthudrp/1tbiNxyQ6U0vIrjGTQACsj X-Coremail-Antispam: 1U5529EdanIXcx71UUUUU7vcSsGvfC2KfnxnUU== ------=_Part_66153_753960558.1312342883131 Content-Type: text/plain; charset=GBK Content-Transfer-Encoding: 7bit or I should ask, should the input of the reducer for the group of year 1900 be like key, value pair (1900,35), null (1900,34),null (1900,33),null or like (1900,35), null (1900,35), null ==> since (1900,34) is for the same group as (1900,35), so it use (1900,35) as the key. (1900,35), null At 2011-08-03 10:35:51,"Daniel,Wu" wrote: > >So the key of a group is determined by the first coming record in the group, if we have 3 records in a group >1: (1900,35) >2:(1900,34) >3:(1900,33) > >if (1900,35) comes in as the first row, then the result key will be (1900,35), when the second row (1900,34) comes in, it won't the impact the key of the group, meaning it will not overwrite the key (1900,35) to (1900,34), correct. > >>in the KeyComparator, these are guaranteed to come in reverse order in the >second slot. That is, if 35 is the maximum temperature then (1900,35) will >come before ANY other (1900,t). Then as the GroupComparator does its >thing, any time (1900,t) comes up it gets compared AND FOUND EQUAL TO >(1900,35), and thus its (null) value is added to the (1900,35) group. > >The reducer then gets a (1900,35) key with an Iterable of null values, >which it pretty much discards and just emits the key, which contains the >maximum value. ------=_Part_66153_753960558.1312342883131--