hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Bhattacharjee <rahul.rec....@gmail.com>
Subject Re: Now give .gz file as input to the MAP
Date Wed, 12 Jun 2013 04:53:52 GMT
Nothing special is required for process .gz files using MR. however , as
Sanjay mentioned , verify the codec's configured in core-site and another
thing to note is that these files are not splittable.

You might want to use bz2 , these are splittable.

Thanks,
Rahul


On Wed, Jun 12, 2013 at 10:14 AM, Sanjay Subramanian <
Sanjay.Subramanian@wizecommerce.com> wrote:

>  hadoopConf.set("mapreduce.job.inputformat.class",
> "com.wizecommerce.utils.mapred.TextInputFormat");
>
> hadoopConf.set("mapreduce.job.outputformat.class",
> "com.wizecommerce.utils.mapred.TextOutputFormat");
>  No special settings required for reading Gzip except these above
>
>  I u want to output Gzip
>
>  hadoopConf.set("mapreduce.output.fileoutputformat.compress", "true");
>
> hadoopConf.set("mapreduce.output.fileoutputformat.compress.codec",
> "org.apache.hadoop.io.compress.GzipCodec");
>
> Make sure Gzip codec is defined in core-site.xml
>  <!-- core-site.xml -->
>  <property>
>      <name>io.compression.codecs</name>
>      <value
> >org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec</
> value>
>  </property>
>
>  I have a question
>
>  Why are u using GZIP as input to Map ? These are not splittableā€¦Unless u
> have to read multilines (like lines between a BEGIN and END block in a log
> file) and send it as one record to the mapper
>
>  Also in Non-splitable Snappy Codec is better
>
>  Good Luck
>
>
>  sanjay
>
>   From: samir das mohapatra <samir.helpdoc@gmail.com>
> Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
> Date: Tuesday, June 11, 2013 9:07 PM
> To: "cdh-user@cloudera.com" <cdh-user@cloudera.com>, "
> user@hadoop.apache.org" <user@hadoop.apache.org>, "
> user-help@hadoop.apache.org" <user-help@hadoop.apache.org>
> Subject: Now give .gz file as input to the MAP
>
>   Hi All,
>     Did any one worked on, how to pass the .gz file as  file input for
> mapreduce job ?
>
> Regards,
> samir.
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>

Mime
View raw message