hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: How to read LZO compressed files?
Date Mon, 02 Jan 2012 07:22:28 GMT
Hello Edward,

On Mon, Jan 2, 2012 at 11:04 AM, edward choi <mp2893@gmail.com> wrote:
> Hi,
>
> I'm having trouble trying to handle lzo compressed files.
> The input files are compressed by LzopCodec provided by hadoop-lzo package.
> And I am using Cloudera 3 update 2 version Hadoop.
>
> I don't need to split the input file, so there is no need telling me to
> index the input file and to use LzoTextInputFormat, unless that is the only
> way to handle lzo-compressed files.

Its OK to use LZO without splitting. There are no issues in doing that.

> I thought all I needed to do was set the job input format as
> "TextInputFormat" and hadoop will take care of the rest.
> When I do that, I don't get any error messages but log files tell me that
> input files are not decompressed at all. Input files are being handled as
> raw text files.

By 'Input files are being handled as raw text files.' I assume you
mean that your mappers are receiving garbage (compressed) input,
without being decoded?

Have you ensured that your io.compression.codecs property in
core-site.xml carries LzoCodec and LzopCodec canonical classnames, and
that your MR cluster was restarted with this change added?

> Is there a specific way to read files with lzo extension?

The above config registers ".lzo" look-outs and auto-detection of LZO
files so you shouldn't need an explicit way.

-- 
Harsh J

Mime
View raw message