hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: LZO with sequenceFile
Date Sun, 26 Feb 2012 18:49:26 GMT
Hi Mohit,

On Sun, Feb 26, 2012 at 10:42 PM, Mohit Anchlia <mohitanchlia@gmail.com> wrote:
> Thanks! Some questions I have is:
> 1. Would it work with sequence files? I am using
> SequenceFileAsTextInputStream

Yes, you just need to set the right codec when you write the file.
Reading is then normal as reading a non-compressed sequence-file.

The codec classnames are stored as meta information into sequence
files and are read back to load the right codec for the reader - thus
you don't have to specify a 'reader' codec once you are done writing a
file with any codec of choice.

> 2. If I use SequenceFile.CompressionType.RECORD or BLOCK would it still
> split the files?

Yes SequenceFiles are a natively splittable file format, designed for
HDFS and MapReduce. Compressed sequence files are thus splittable too.

You mostly need block compression unless your records are large in
size and you feel you'll benefit better with compression algorithms
applied to a single, complete record instead of a bunch of records.

> 3. I am also using CDH's 20.2 version of hadoop.

http://www.cloudera.com/assets/images/diagrams/whats-in-a-version.png :)

Harsh J

View raw message