hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Subramanian <Sanjay.Subraman...@wizecommerce.com>
Subject Re: Now give .gz file as input to the MAP
Date Wed, 12 Jun 2013 04:44:25 GMT
hadoopConf.set("mapreduce.job.inputformat.class", "com.wizecommerce.utils.mapred.TextInputFormat");

hadoopConf.set("mapreduce.job.outputformat.class", "com.wizecommerce.utils.mapred.TextOutputFormat");

No special settings required for reading Gzip except these above

I u want to output Gzip


hadoopConf.set("mapreduce.output.fileoutputformat.compress", "true");

hadoopConf.set("mapreduce.output.fileoutputformat.compress.codec", "org.apache.hadoop.io.compress.GzipCodec");


Make sure Gzip codec is defined in core-site.xml
<!-- core-site.xml -->
<property>
    <name>io.compression.codecs</name>
    <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec</value>
</property>

I have a question

Why are u using GZIP as input to Map ? These are not splittableā€¦Unless u have to read multilines
(like lines between a BEGIN and END block in a log file) and send it as one record to the
mapper

Also in Non-splitable Snappy Codec is better

Good Luck


sanjay

From: samir das mohapatra <samir.helpdoc@gmail.com<mailto:samir.helpdoc@gmail.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Tuesday, June 11, 2013 9:07 PM
To: "cdh-user@cloudera.com<mailto:cdh-user@cloudera.com>" <cdh-user@cloudera.com<mailto:cdh-user@cloudera.com>>,
"user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>,
"user-help@hadoop.apache.org<mailto:user-help@hadoop.apache.org>" <user-help@hadoop.apache.org<mailto:user-help@hadoop.apache.org>>
Subject: Now give .gz file as input to the MAP

Hi All,
    Did any one worked on, how to pass the .gz file as  file input for mapreduce job ?

Regards,
samir.

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s)
and may contain confidential and privileged information. Any unauthorized review, use, disclosure
or distribution is prohibited. If you are not the intended recipient, please contact the sender
by reply email and destroy all copies of the original message along with any attachments,
from your computer system. If you are the intended recipient, please be advised that the content
of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Mime
View raw message