Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 14772 invoked from network); 27 Jan 2010 17:58:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Jan 2010 17:58:34 -0000 Received: (qmail 62203 invoked by uid 500); 27 Jan 2010 17:58:31 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 62141 invoked by uid 500); 27 Jan 2010 17:58:31 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 62131 invoked by uid 99); 27 Jan 2010 17:58:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jan 2010 17:58:31 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [202.165.103.83] (HELO web15902.mail.cnb.yahoo.com) (202.165.103.83) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 27 Jan 2010 17:58:22 +0000 Received: (qmail 84011 invoked by uid 60001); 27 Jan 2010 17:57:59 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com.cn; s=s1024; t=1264615079; bh=noCh9MuwZ35i3CnBmZakm1q124SbyltARxEvbssvJKM=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=wCmjUYnPnFKfkzWlj2LW3NhCopqcV7jBe3ssM/Zh7FVkfGkODmpzHcYUd0sxbjZEXieBf0buqKvKCSrdhYU6hvhzPGcy9QHbXnskvcx3EiSkNgO8N4gxLOHnIw4eUOyk7Qh7iPDdC0TmwkZYBPTa5c9sihqzEsQkcGCHk+dWx0s= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com.cn; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=KhoU9Sgi0sYVN16ej3uC2vLPij7SPEVbf5H/4nNocDpixgWTFzePECV9Tjlw56xNhl6yH5v/lEoCBKYwFv+ZuDZ0S97TSFUg9Hj3B7OpRqaRyqtHuLboX2rtqRBYPHOE6LqC7J2UvPKDJCx2JbuTYtFBXi5glQSLDeV0hUOcn5Q=; Message-ID: <34630.84008.qm@web15902.mail.cnb.yahoo.com> X-YMail-OSG: 6b25tNoVM1nzN1fo.L1UPYK6qqpaJqckMzvXd4jBmsRjsaOGpjOnFJDISiMOOCZ0f5FR36y1Xj..fDPYHDdiTQBrSE9KtruEaMBdCGe.5eU_LCEDARlM48qJ5MhYfdCIRmuNCuIcQNeej..wwzJ2QUQgbSt87u52pUNkQ_schOmb2az2V.5jY1mscrnyjQE3AoKIgzXezBewjoqQ5oL60RVgvq4nA_2xz1R2NWVWBqzoTvxbSRD7ykHrFf5wScBjzs59nzn7Nah.TT9b5TF.1hx6M4xMk8oBZ2JnDY7VpnkWe56LW0Rm77KkmVSE.mo- Received: from [152.3.136.188] by web15902.mail.cnb.yahoo.com via HTTP; Thu, 28 Jan 2010 01:57:58 CST X-Mailer: YahooMailRC/240.3 YahooMailWebService/0.8.100.260964 References: <4B607064.4000008@cs.cmu.edu> Date: Thu, 28 Jan 2010 01:57:58 +0800 (CST) From: Gang Luo Subject: Re: When exactly is combiner invoked? To: common-user@hadoop.apache.org In-Reply-To: <4B607064.4000008@cs.cmu.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=gb2312 Content-Transfer-Encoding: quoted-printable When the map function generate the intermediate result and first sent them = to buffer, the partitioning and sorting will start working and , if you spe= cify a combiner, it will be invoked at this time. This process is in parall= el with the map function. When map function finishes, all the spills on dis= k will be merged, combiners will also be invoked at this time. =0A=0A-Gang= =0A=0A=0A=0A----- =D4=AD=CA=BC=D3=CA=BC=FE ----=0A=B7=A2=BC=FE=C8=CB=A3=BA = Le Zhao =0A=CA=D5=BC=FE=C8=CB=A3=BA common-user@hadoop.a= pache.org=0A=B7=A2=CB=CD=C8=D5=C6=DA=A3=BA 2010/1/27 (=D6=DC=C8=FD) 11:57:0= 8 =C9=CF=CE=E7=0A=D6=F7 =CC=E2=A3=BA When exactly is combiner invoked?=0A= =0AHi - combiner performs on a chunk of mapper output data, but what exactl= y is the chunk cut off, or when exactly will the chunk be fed to the combin= er?=0A=0A1. Will it be after the mapper finishes processing an input record= ?=0A2. Will it be after the mapper outputs a key value pair that hits the m= emory limit?=0A=0AThis will be important to know, because strategy 1 gives = more guarantee over output record duplicity than 2, say when an input recor= d for the mapper can correspond to multiple output records with the same ke= y.=0A=0AThanks,=0ALe=0A=0A=0A=0A _____________________________________= ______________________ =0A =BA=C3=CD=E6=BA=D8=BF=A8=B5=C8=C4=E3=B7=A2=A3= =AC=D3=CA=CF=E4=BA=D8=BF=A8=C8=AB=D0=C2=C9=CF=CF=DF=A3=A1 =0Ahttp://card.ma= il.cn.yahoo.com/