hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom White <...@cloudera.com>
Subject Re: Gzipped input files
Date Fri, 08 Oct 2010 21:34:25 GMT
It's done by the RecordReader. For text-based input formats, which use
LineRecordReader, decompression is carried out automatically. For
others it's not (e.g. sequence files which have internal compression).
So it depends on what your custom input format does.

Cheers,
Tom

On Fri, Oct 8, 2010 at 1:58 PM, Patrick Marchwiak <pmarchwiak@gmail.com> wrote:
> Hi,
> The Hadoop Definitive Guide book states that "if your input files are
> compressed, they will be automatically decompressed as they are read
> by MapReduce, using the filename extension to determine the codec to
> use" (in the section titled "Using Compression in MapReduce"). I'm
> trying to run a mapreduce job with some gzipped files as input and
> this isn't working. Does support for this have to be built into the
> input format? I'm using a custom one that extends from
> FileInputFormat. Is there an additional configuration option that
> should be set?  I'd like to avoid having to do decompression from
> within my map.
>
> I'm using the new API and the CDH3b2 distro.
>
> Thanks.
>

Mime
View raw message