Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 31F8F9D98 for ; Sun, 29 Jan 2012 08:08:17 +0000 (UTC) Received: (qmail 23894 invoked by uid 500); 29 Jan 2012 08:08:16 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 23561 invoked by uid 500); 29 Jan 2012 08:08:14 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 23553 invoked by uid 99); 29 Jan 2012 08:08:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 29 Jan 2012 08:08:12 +0000 X-ASF-Spam-Status: No, hits=4.7 required=5.0 tests=FREEMAIL_FORGED_REPLYTO,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.139.91.190] (HELO nm4-vm0.bullet.mail.sp2.yahoo.com) (98.139.91.190) by apache.org (qpsmtpd/0.29) with SMTP; Sun, 29 Jan 2012 08:08:02 +0000 Received: from [98.139.91.66] by nm4.bullet.mail.sp2.yahoo.com with NNFMP; 29 Jan 2012 08:07:41 -0000 Received: from [98.139.91.10] by tm6.bullet.mail.sp2.yahoo.com with NNFMP; 29 Jan 2012 08:06:41 -0000 Received: from [127.0.0.1] by omp1010.mail.sp2.yahoo.com with NNFMP; 29 Jan 2012 08:06:41 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 593169.48979.bm@omp1010.mail.sp2.yahoo.com Received: (qmail 91962 invoked by uid 60001); 29 Jan 2012 08:06:41 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1327824401; bh=Pn27M25Ouz+/VcwOBbGoVZFdCnLPrz37kNj3gr49qBE=; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=6A4airXwn/xy2u3K9g8pfv5d/N9fDWTagPDeGLRXS85VOfCxo6uVL1OlFxKJLOqhSQTBZ5XaKBnvVjBsV9h7snycBqexh33kcyDa3RGtvGlrkPC3qDna2Sfp2jckoqMSooLnNjIMe7xzM5rAYQW47dDWUcvYNbCr1yC8swnO7Xw= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=XadWKLsdRmEV56BCqXvyYq2wLw7hcEfrYhhA2n1Oh0oG2NeDOneGsepHTTezXd1s4kK3xwdOXf8djlr/zgbz4jOgbR9sMhUucbkWJJEnSTdlel/bpmhTbFQvSXicxQeDSZ2jNTbzhji5jRe3uoa6Ql1j2sVmO6SF+6lLyqs5Kvo=; X-YMail-OSG: 1E0w7.AVM1nBWRh2qyDpmg1oM.EgFVq6sTdqoLFxVheH3Hg X8yIIFc2ckgWtDUll_v1KWi1Gw2PJn75nc8nKzd10.gEUyEdhH3FgKLKmMbZ 2u.4fT_F.EglcZkmJigx.ANHV4b6sLx1ZD2sR6S6PoWWAAz2_BNxNx3r63iY D3_FHG5mgumiYMj_t1WkC7Ps46DrsWHAhCX.8EX4WSjWpmY4k.4poSp8PpzF wI9l1pApq.0vKoNhplp882RxyU332MqTdDyzAEeJcp2TYkGYkrXaS9o1bZXw Z4tp76bAZgzkBCKBiDHbxhPmEGdBT_3cCCDrvG_ZKF1SGmLoi8tzB05LXjkG 0yrrJ5MYK2jxQsf.lwUx1d7Gn4ZfrmElJHebnKIGUtn.facgvC_m2w7C9LJw owalGNXHmn6bVHlhVfR6pp4DU0VrAr_fCiIX2kesYtz9_AGt52kD5MuW1d1d Q Received: from [107.3.161.36] by web164516.mail.gq1.yahoo.com via HTTP; Sun, 29 Jan 2012 00:06:40 PST X-Mailer: YahooMailWebService/0.8.116.331537 Message-ID: <1327824400.90923.YahooMailNeo@web164516.mail.gq1.yahoo.com> Date: Sun, 29 Jan 2012 00:06:40 -0800 (PST) From: Jianhui Zhang Reply-To: Jianhui Zhang Subject: anyway to do "local" reduce like the combiner does? To: "mapreduce-user@hadoop.apache.org" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="1701424453-326791085-1327824400=:90923" --1701424453-326791085-1327824400=:90923 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable =0A=0AI have a problem at hand that seems to need "local" reducing: =0A=0A= =0AI have a large data input, in which each line is a data mapping, somethi= ng like "name : attribute". The attributes for the same name are usually pr= etty close in the file, so they are very likely to be processed by the same= mapper. I need to persist the "name:attributes" somewhere else (think DB).= It'll be optimal if I can combine the attributes of the same name together= and only persist them once. Attributes for the same name from different ma= ppers can be safely persisted separately.=A0=0A=0AI don't want to use reduc= ers due to the network traffic. What I need is exactly what a combiner does= , but as far as I can tell, combiners are not guaranteed to run or run only= once (Correct me if I'm wrong here), so I guess I am not supposed to imple= ment the persistence in the combiner.=A0=0A=0AAnybody has got a similar pro= blem before? What's your solution? =0A=0A=0AAppreciate your help. =0A=0A=0A= Thanks,=0AJames=0A --1701424453-326791085-1327824400=:90923 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable

I have a pro= blem at hand that seems to need "local" reducing:

=
I have a large data input, in which each line is a data mapping, somet= hing like "name : attribute". The attributes for the same name are usually = pretty close in the file, so they are very likely to be processed by the sa= me mapper. I need to persist the "name:attributes" somewhere else (think DB= ). It'll be optimal if I can combine the attributes of the same name togeth= er and only persist them once. Attributes for the same name from different = mappers can be safely persisted separately. 

= I don't want to use reducers due to the network traffic. What I need is exa= ctly what a combiner does, but as far as I can tell, combiners are not guar= anteed to run or run only once (Correct me if I'm wrong here), so I guess I am not supposed to implement the persistence in the combiner. = ;

Anybody has got a similar problem before? What's= your solution?

Appreciate your help.

Thanks,
James

<= /body> --1701424453-326791085-1327824400=:90923--