Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 11AD2D579 for ; Tue, 31 Jul 2012 15:11:16 +0000 (UTC) Received: (qmail 17446 invoked by uid 500); 31 Jul 2012 15:11:12 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 17391 invoked by uid 500); 31 Jul 2012 15:11:12 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 17377 invoked by uid 99); 31 Jul 2012 15:11:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Jul 2012 15:11:12 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_FREEMAIL_1,FSL_FREEMAIL_2,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [72.30.239.23] (HELO nm38-vm7.bullet.mail.bf1.yahoo.com) (72.30.239.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Jul 2012 15:11:00 +0000 Received: from [98.139.212.153] by nm38.bullet.mail.bf1.yahoo.com with NNFMP; 31 Jul 2012 15:10:39 -0000 Received: from [98.139.212.215] by tm10.bullet.mail.bf1.yahoo.com with NNFMP; 31 Jul 2012 15:10:39 -0000 Received: from [127.0.0.1] by omp1024.mail.bf1.yahoo.com with NNFMP; 31 Jul 2012 15:10:39 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 392501.88535.bm@omp1024.mail.bf1.yahoo.com Received: (qmail 50662 invoked by uid 60001); 31 Jul 2012 15:10:39 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1343747439; bh=pxj0epblJFvVZDAPTV7dlePsyS2Lq97V81FaIZSsqL8=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=3utnQ2xYSp4WWLnuRRldfDzqPID0EcVWLXJb8dtVh/8N8A9431uBgQ0Zbb+peMt5wCGjeN2p5KEfgzYSc92uExTfgaQRtoH4HmNoxZFjZuknEcZgiTmKexZMrpLN9GFPAnouoNeGagFJgJ883eLEXn8Jl3JAOSJwotQItVQD264= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=YqxXCmdty84QD0vUr4Qgmf3E1uvf6v2Jxs/4bDp+qJP9WODfVYWL7TVN54jmmwE1Kf/gsXVr7d1ehuX5oy1DtoWGL+5EwU7sfD1VTLNwLajWn/5kyxTj7rLepnsT0UMvHN152/0YNB6vTBXFzREfuMwMXKyRTh6QzmpkDzY6mzw=; X-YMail-OSG: zy6Cl8oVM1kbiG3pY9u7G_AXFnPHsfmlWF9pWgpGWxH8ZX2 Wc2U5XtiS10CiITz6qVDqQ5Boya_N40dPQ_StcR7F0Vqs2ElPx5xg3ebXG6N 3X4iNQyFx7.UCQaYJh6sT2MYDVfgArs1bRBWKlckxQca.Mal7gnT8RYsaMH5 IIfYvXLcopCUFHEuT4EQ9s8tiBdDiSKfx9UvKpHeq3Ac85eEwb2Vz11VXjV_ p8ScgiQDnoa7Y5glTQIVgtmh.b9Az6Nb754HUa1JtEnTNu8AzwRhIJT.xPNY MvnOHXLJCfT1hny8y5yfzcz0d9UY1JihiMqVGivD8biIsHvLly2xAX_z0C8B ETT77mUp.B9FXQEeKg6rdRwMNqhZUdHaKk0kZGtpud9agEpGgxvys0U22XuI X7Mvchoz7yNd7.rC._tpErG16qp7gISdpbZRY.rOgeXi9m6KHysMv5zEdwmC wLmfOQhembZNFMoXhdmLCJfcXV8uD_UQzrv_W_eq.kvQ9fevp.5n5WpEPIUI - Received: from [98.234.31.8] by web160704.mail.bf1.yahoo.com via HTTP; Tue, 31 Jul 2012 08:10:39 PDT X-Mailer: YahooMailWebService/0.8.120.356233 References: <2078911720-1343709058-cardhu_decombobulator_blackberry.rim.net-910385435-@b2.c15.bise7.blackberry> Message-ID: <1343747439.38228.YahooMailNeo@web160704.mail.bf1.yahoo.com> Date: Tue, 31 Jul 2012 08:10:39 -0700 (PDT) From: Raj Vishwanathan Reply-To: Raj Vishwanathan Subject: Re: Merge Reducers Output To: "common-user@hadoop.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-672873354-1912038941-1343747439=:38228" X-Virus-Checked: Checked by ClamAV on apache.org ---672873354-1912038941-1343747439=:38228 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Is there a requirement for the final reduce file to be sorted? If not, woul= dn't a map only job ( + =A0a combiner, ) and a merge only job provide the a= nswer?=0A=0ARaj=0A=0A=0A=0A>________________________________=0A> From: Mich= ael Segel =0A>To: common-user@hadoop.apache.org = =0A>Sent: Tuesday, July 31, 2012 5:24 AM=0A>Subject: Re: Merge Reducers Out= put=0A> =0A>You really don't want to run a single reducer unless you know t= hat you don't have a lot of mappers. =0A>=0A>As long as the output data typ= es and structure are the same as the input, you can run your code as the co= mbiner, and then run it again as the reducer. Problem solved with one or tw= o lines of code. =0A>If your input and output don't match, then you can use= the existing code as a combiner, and then write a new reducer. It could as= easily be an identity reducer too. (Don't know the exact problem.) =0A>=0A= >So here's a silly question. Why wouldn't you want to run a combiner? =0A>= =0A>=0A>On Jul 31, 2012, at 12:08 AM, Jay Vyas wrote= :=0A>=0A>> Its not clear to me that you need custom input formats....=0A>> = =0A>> 1) Getmerge might work or=0A>> =0A>> 2) Simply run a SINGLE reducer j= ob (have mappers output static final int=0A>> key=3D1, or specify numReduce= rs=3D1).=0A>> =0A>> In this case, only one reducer will be called, and it w= ill read through all=0A>> the values.=0A>> =0A>> On Tue, Jul 31, 2012 at 12= :30 AM, Bejoy KS wrote:=0A>> =0A>>> Hi=0A>>> =0A>>= > Why not use 'hadoop fs -getMerge =0A>>> ' while copying files out of hdfs for the end users to=0A>>> consum= e. This will merge all the files in 'outputFolderInHdfs'=A0 into one=0A>>> = file and put it in lfs.=0A>>> =0A>>> Regards=0A>>> Bejoy KS=0A>>> =0A>>> Se= nt from handheld, please excuse typos.=0A>>> =0A>>> -----Original Message--= ---=0A>>> From: Michael Segel =0A>>> Date: Mon, = 30 Jul 2012 21:08:22=0A>>> To: =0A>>> Reply-= To: common-user@hadoop.apache.org=0A>>> Subject: Re: Merge Reducers Output= =0A>>> =0A>>> Why not use a combiner?=0A>>> =0A>>> On Jul 30, 2012, at 7:59= PM, Mike S wrote:=0A>>> =0A>>>> Liked asked several times, I need to merge= my reducers output files.=0A>>>> Imagine I have many reducers which will g= enerate 200 files. Now to=0A>>>> merge them together, I have written anothe= r map reduce job where each=0A>>>> mapper read a complete file in full in m= emory, and output that and=0A>>>> then only one reducer has to merge them t= ogether. To do so, I had to=0A>>>> write a custom fileinputreader that read= s the complete file into=0A>>>> memory and then another custom fileoutputfi= leformat to append the each=0A>>>> reducer item bytes together. this how my= mapper and reducers looks=0A>>>> like=0A>>>> =0A>>>> public static class M= apClass extends Mapper>>> BytesWritable, IntWritable, Byt= esWritable>=0A>>>>=A0 =A0 =A0 {=0A>>>>=A0 =A0 =A0 =A0 =A0 =A0 =A0 @Override= =0A>>>>=A0 =A0 =A0 =A0 =A0 =A0 =A0 public void map(NullWritable key, BytesW= ritable value,=0A>>> Context=0A>>>> context) throws IOException, Interrupte= dException=0A>>>>=A0 =A0 =A0 =A0 =A0 =A0 =A0 {=0A>>>>=A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 context.write(key, value);=0A>>>>=A0 =A0 =A0 =A0 = =A0 =A0 =A0 }=0A>>>>=A0 =A0 =A0 }=0A>>>> =0A>>>>=A0 =A0 =A0 public static c= lass Reduce extends Reducer>>> BytesWritable, NullWritabl= e, BytesWritable>=0A>>>>=A0 =A0 =A0 {=0A>>>>=A0 =A0 =A0 =A0 =A0 =A0 =A0 @Ov= erride=0A>>>>=A0 =A0 =A0 =A0 =A0 =A0 =A0 public void reduce(NullWritable ke= y,=0A>>> Iterable values,=0A>>>> Context context) throws IOE= xception, InterruptedException=0A>>>>=A0 =A0 =A0 =A0 =A0 =A0 =A0 {=0A>>>>= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 for (BytesWritable value : valu= es)=0A>>>>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 {=0A>>>>=A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 context.write(NullWritable.= get(), value);=0A>>>>=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }=0A>>>>= =A0 =A0 =A0 =A0 =A0 =A0 =A0 }=0A>>>>=A0 =A0 =A0 }=0A>>>> =0A>>>> I still ha= ve to have one reducers and that is a bottle neck. Please=0A>>>> note that = I must do this merging as the users of my MR job are outside=0A>>>> my hado= op environment and the result as one file.=0A>>>> =0A>>>> Is there better w= ay to merge reducers output files?=0A>>>> =0A>>> =0A>>> =0A>> =0A>> =0A>> -= - =0A>> Jay Vyas=0A>> MMSB/UCHC=0A>=0A>=0A>=0A> ---672873354-1912038941-1343747439=:38228--