Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2BDD1101C0 for ; Wed, 12 Jun 2013 04:45:08 +0000 (UTC) Received: (qmail 57043 invoked by uid 500); 12 Jun 2013 04:45:00 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 55953 invoked by uid 500); 12 Jun 2013 04:44:57 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 55937 invoked by uid 99); 12 Jun 2013 04:44:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jun 2013 04:44:57 +0000 X-ASF-Spam-Status: No, hits=-2.1 required=5.0 tests=ASF_LIST_OPS,HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Sanjay.Subramanian@wizecommerce.com designates 207.46.163.24 as permitted sender) Received: from [207.46.163.24] (HELO co9outboundpool.messaging.microsoft.com) (207.46.163.24) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jun 2013 04:44:52 +0000 Received: from mail132-co9-R.bigfish.com (10.236.132.230) by CO9EHSOBE019.bigfish.com (10.236.130.82) with Microsoft SMTP Server id 14.1.225.23; Wed, 12 Jun 2013 04:44:30 +0000 Received: from mail132-co9 (localhost [127.0.0.1]) by mail132-co9-R.bigfish.com (Postfix) with ESMTP id 1B6ED60015E; Wed, 12 Jun 2013 04:44:30 +0000 (UTC) X-Forefront-Antispam-Report: CIP:157.56.232.197;KIP:(null);UIP:(null);IPV:NLI;H:BLUPRD0411HT004.namprd04.prod.outlook.com;RD:none;EFVD:NLI X-SpamScore: 3 X-BigFish: PS3(z37d4lz9371I181fMc85eh14ffIdd85k9a6kzz1f42h1ee6h1de0h1fdah1202h1e76h1d1ah1d2ah1fc6hzz18c673h8275bh8275dhz2fh2a8h668h839hbe3he5bhf0ah1288h12a5h12bdh137ah1441h1504h1537h153bh162dh1631h1758h18e1h1946h19b5h1ad9h1b0ah1bceh1d0ch1d2eh1d3fh1dc1h1dfeh1dffh1e1dh1155h) Received-SPF: pass (mail132-co9: domain of wizecommerce.com designates 157.56.232.197 as permitted sender) client-ip=157.56.232.197; envelope-from=Sanjay.Subramanian@wizecommerce.com; helo=BLUPRD0411HT004.namprd04.prod.outlook.com ;.outlook.com ; Received: from mail132-co9 (localhost.localdomain [127.0.0.1]) by mail132-co9 (MessageSwitch) id 1371012267696815_5340; Wed, 12 Jun 2013 04:44:27 +0000 (UTC) Received: from CO9EHSMHS009.bigfish.com (unknown [10.236.132.232]) by mail132-co9.bigfish.com (Postfix) with ESMTP id A75EF240062; Wed, 12 Jun 2013 04:44:27 +0000 (UTC) Received: from BLUPRD0411HT004.namprd04.prod.outlook.com (157.56.232.197) by CO9EHSMHS009.bigfish.com (10.236.130.19) with Microsoft SMTP Server (TLS) id 14.1.225.23; Wed, 12 Jun 2013 04:44:27 +0000 Received: from BLUPRD0411MB426.namprd04.prod.outlook.com ([169.254.10.97]) by BLUPRD0411HT004.namprd04.prod.outlook.com ([10.255.127.39]) with mapi id 14.16.0324.000; Wed, 12 Jun 2013 04:44:26 +0000 From: Sanjay Subramanian To: "user@hadoop.apache.org" , "cdh-user@cloudera.com" , "user-help@hadoop.apache.org" Subject: Re: Now give .gz file as input to the MAP Thread-Topic: Now give .gz file as input to the MAP Thread-Index: AQHOZyJoKOXBtjJA70uSiuuVUevN/pkxC12A Date: Wed, 12 Jun 2013 04:44:25 +0000 Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.255.127.4] Content-Type: multipart/alternative; boundary="_000_CDDD493B8485sanjaysubramanianwizecommercecom_" MIME-Version: 1.0 X-OriginatorOrg: wizecommerce.com X-Virus-Checked: Checked by ClamAV on apache.org --_000_CDDD493B8485sanjaysubramanianwizecommercecom_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable hadoopConf.set("mapreduce.job.inputformat.class", "com.wizecommerce.utils.m= apred.TextInputFormat"); hadoopConf.set("mapreduce.job.outputformat.class", "com.wizecommerce.utils.= mapred.TextOutputFormat"); No special settings required for reading Gzip except these above I u want to output Gzip hadoopConf.set("mapreduce.output.fileoutputformat.compress", "true"); hadoopConf.set("mapreduce.output.fileoutputformat.compress.codec", "org.apa= che.hadoop.io.compress.GzipCodec"); Make sure Gzip codec is defined in core-site.xml io.compression.codecs org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.com= press.DefaultCodec I have a question Why are u using GZIP as input to Map ? These are not splittable=85Unless u = have to read multilines (like lines between a BEGIN and END block in a log = file) and send it as one record to the mapper Also in Non-splitable Snappy Codec is better Good Luck sanjay From: samir das mohapatra > Reply-To: "user@hadoop.apache.org" > Date: Tuesday, June 11, 2013 9:07 PM To: "cdh-user@cloudera.com" >, "user@hadoop.apache.org" >,= "user-help@hadoop.apache.org" > Subject: Now give .gz file as input to the MAP Hi All, Did any one worked on, how to pass the .gz file as file input for mapr= educe job ? Regards, samir. CONFIDENTIALITY NOTICE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D This email message and any attachments are for the exclusive use of the int= ended recipient(s) and may contain confidential and privileged information.= Any unauthorized review, use, disclosure or distribution is prohibited. If= you are not the intended recipient, please contact the sender by reply ema= il and destroy all copies of the original message along with any attachment= s, from your computer system. If you are the intended recipient, please be = advised that the content of this message is subject to access, review and d= isclosure by the sender's Email System Administrator. --_000_CDDD493B8485sanjaysubramanianwizecommercecom_ Content-Type: text/html; charset="Windows-1252" Content-ID: <1C70D1195414D54283305ABB577B2F43@namprd04.prod.outlook.com> Content-Transfer-Encoding: quoted-printable

hadoopConf.set("mapreduce.job.inputfor= mat.class", "com.wizecommerce.utils.mapred.TextInputFormat");

hadoopConf.set("mapreduce.job.outputfo= rmat.class", "com.wizecommerce.utils.mapred.TextOutputFormat");

No special settings required for reading Gzip except these above =

I u want to output Gzip 

hadoopConf.set("mapreduce.output.fileo= utputformat.compress", "true");

hadoopConf.set("mapreduce.output.fileo= utputformat.compress.codec", "org.apache.hadoop.io.compress.GzipCodec");

 
Make sure Gzip codec is defined in core-site.xml
&l= t;!-- core-site.xml -->
<property>
    <name>io.compression.codecs</name>
    <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io= .compress.DefaultCodec</value </property>

I have a question

Why are u using GZIP as input to Map ? These are not splittable=85Unle= ss u have to read multilines (like lines between a BEGIN and END block in a= log file) and send it as one record to the mapper

Also in Non-splitable Snappy Codec is better

Good Luck


sanjay 

From: samir das mohapatra <samir.helpdoc@gmail.com>
Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Date: Tuesday, June 11, 2013 9:07 P= M
To: "cdh-user@cloudera.com" <cdh-user@cloudera.com>, "user@hadoop.apache.org" <user@hadoop.apache.org>, "user-help@hadoop.apac= he.org" <user-he= lp@hadoop.apache.org>
Subject: Now give .gz file as input= to the MAP

Hi All,
    Did any one worked on, how to pass the .gz file as&= nbsp; file input for mapreduce job ?
 
Regards,
samir.

CONFIDENTIALITY NOTICE
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
This email message and any attachments are for the exclusive use of the int= ended recipient(s) and may contain confidential and privileged information.= Any unauthorized review, use, disclosure or distribution is prohibited. If= you are not the intended recipient, please contact the sender by reply email and destroy all copies of the ori= ginal message along with any attachments, from your computer system. If you= are the intended recipient, please be advised that the content of this mes= sage is subject to access, review and disclosure by the sender's Email System Administrator.
--_000_CDDD493B8485sanjaysubramanianwizecommercecom_--