Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CCD98E3D3 for ; Wed, 19 Dec 2012 09:24:16 +0000 (UTC) Received: (qmail 33142 invoked by uid 500); 19 Dec 2012 09:24:11 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 33009 invoked by uid 500); 19 Dec 2012 09:24:11 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 32990 invoked by uid 99); 19 Dec 2012 09:24:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Dec 2012 09:24:10 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [85.115.54.190] (HELO cluster-j.mailcontrol.com) (85.115.54.190) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Dec 2012 09:24:04 +0000 Received: from sportingindex.com (83-244-202-21.cust-83.exponential-e.net [83.244.202.21]) by rly59j.srv.mailcontrol.com (MailControl) with ESMTP id qBJ9NRFX007153; Wed, 19 Dec 2012 09:23:37 GMT Received: from dss-protector.sig.ads (unknown [127.0.0.1]) by dss-protector.sig.ads (Service) with ESMTP id F2911128014; Wed, 19 Dec 2012 09:23:26 +0000 (GMT) Received: from GBGH-SVEXCHFE02.sig.ads (unknown [10.10.14.23]) by dss-protector.sig.ads (Service) with ESMTP id E8298128002; Wed, 19 Dec 2012 09:23:23 +0000 (GMT) Received: from GBGH-EXCH-CMS.sig.ads ([fe80::dcac:17fe:e957:d280]) by GBGH-SVEXCHFE02.sig.ads ([fe80::69ae:bd72:c665:191c%10]) with mapi; Wed, 19 Dec 2012 09:23:23 +0000 From: Tony Burton To: "user@hadoop.apache.org" , "Andy.Kartashov@mpac.ca" Date: Wed, 19 Dec 2012 09:19:42 +0000 Subject: RE: Map output compression in Hadoop 1.0.3 Thread-Topic: Map output compression in Hadoop 1.0.3 Thread-Index: Ac3NWEm6sjupBU9UQUKQhb/x1uPncAALJ+8AAAB0FYAAMTQ7UAPfnAhN Message-ID: <556325346CA26341B6F0530E07F90D96016C64ED6CDF@GBGH-EXCH-CMS.sig.ads> References: <556325346CA26341B6F0530E07F90D96016C64CD967F@GBGH-EXCH-CMS.sig.ads> <556325346CA26341B6F0530E07F90D96016C64CD9681@GBGH-EXCH-CMS.sig.ads>, In-Reply-To: Accept-Language: en-US, en-GB Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, en-GB Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Scanned-By: MailControl 11783.69 (www.mailcontrol.com) on 10.74.0.169 X-Virus-Checked: Checked by ClamAV on apache.org Hi Andy and list, Apologies - I've not been looking at my list inbox for a while, so missed t= his request. I'm running some tests as I type, and will report back when th= ey're done. I'm running the same job for bzip2, gzip and snappy codecs vers= us no map out compression. I guess I should include LZO in the comparison t= oo, but the codec wasn't obvious in the o.a.h.io.compress.* areas of Hadoop= . If someone could point out where to find this codec, that'd be really han= dy. If not I could always google it :) Tony ________________________________________ From: Kartashov, Andy [Andy.Kartashov@mpac.ca] Sent: 29 November 2012 16:09 To: user@hadoop.apache.org Subject: RE: Map output compression in Hadoop 1.0.3 Tony, Can you please share with us on the permorfmance improvement (if any) after= using compression in map.output? I was abpout to start looking into it mys= elf. What compression codec did you use? Rgds, AK -----Original Message----- From: Tony Burton [mailto:TBurton@SportingIndex.com] Sent: Wednesday, November 28, 2012 6:38 AM To: Subject: RE: Map output compression in Hadoop 1.0.3 Also, another point that prompted my initial question: I'd come across "map= red.compress.map.output" in the documentation, but I wasn't 100% sure if th= ere has been or will be any equivalence or correspondence between config se= tting like this one and the naming of the stable and new API. For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf= as previously mentioned, from the "mapred" and "mapreduce" parts of the AP= I. Are config settings that begin with mapred.* related to the stable API with= the implication that there's an mapreduce.* equivalent (eg mapred.compress= .map.output vs mapreduce.compress.map.output), or am I seeing a connection = that doesn't exist? (Hope that makes sense!) -----Original Message----- From: Harsh J [mailto:harsh@cloudera.com] Sent: 28 November 2012 11:25 To: Subject: Re: Map output compression in Hadoop 1.0.3 Hi, The property mapred.output.compress, as its name reads, controls job-output= compression, not intermediate/transient data compression, which is what yo= u mean by "Map output compression". Also note that this property is a per job one and can be toggled, if a user= wanted, on/off for each job specifically. These should be the many ways, exhaustively, for MR1, to turn on "Map outpu= t compression": 1. Set "mapred.compress.map.output" to true in your client's mapred-site.xm= l to turn it on for all jobs run from such a client machine. 2. Set the above in cluster, with true at every node (JT plu= s TTs) and restart them, to turn it on for all job, regardless of what the = job itself specifies. 3. Turn it on per-job basis: 3.1. Stable API: JobConf.setCompressMapOutput(true); 3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true= ); On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton wr= ote: > Hi, > > > > Quick question: What's the best way to turn on Map Output Compression > in Hadoop 1.0.3? The tutorial at > http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use > JobConf.setCompressMapOutput(boolean), but I'm using > o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf. > > > > Is it simply a case of using getConf.set("mapred.output.compress", > true) then constructing my Job from the Configuration object, or is > there more direct way that I've missed? > > > > Thanks, > > > > Tony > > > > > > > ********************************************************************** > ******* P Please consider the environment before printing this email > or attachments > > > This email and any attachments are confidential, protected by > copyright and may be legally privileged. If you are not the intended > recipient, then the dissemination or copying of this email is > prohibited. If you have received this in error, please notify the > sender by replying by email and then delete the email completely from > your system. Neither Sporting Index nor the sender accepts > responsibility for any virus, or any other defect which might affect > any computer or IT system into which the email is received and/or > opened. It is the responsibility of the recipient to scan the email > and no responsibility is accepted for any loss or damage arising in > any way from receipt or use of this email. Sporting Index Ltd is a > company registered in England and Wales with company number 2636842, > whose registered office is at Gateway House, Milverton Street, London, SE= 11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial = Services Authority (reg. no. > 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). > Any financial promotion contained herein has been issued and approved > by Sporting Index Ltd. > > > Outbound email has been scanned for viruses and SPAM -- Harsh J Please consider the environment before printing this email www.sportingindex.com Inbound Email has been scanned for viruses and SPAM NOTICE: This e-mail message and any attachments are confidential, subject t= o copyright and may be privileged. Any unauthorized use, copying or disclos= ure is prohibited. If you are not the intended recipient, please delete and= contact the sender immediately. Please consider the environment before pri= nting this e-mail. AVIS : le pr=E9sent courriel et toute pi=E8ce jointe qui= l'accompagne sont confidentiels, prot=E9g=E9s par le droit d'auteur et peu= vent =EAtre couverts par le secret professionnel. Toute utilisation, copie = ou divulgation non autoris=E9e est interdite. Si vous n'=EAtes pas le desti= nataire pr=E9vu de ce courriel, supprimez-le et contactez imm=E9diatement l= 'exp=E9diteur. Veuillez penser =E0 l'environnement avant d'imprimer le pr= =E9sent courriel