hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niels Basjes <Ni...@basjes.nl>
Subject Re: Is a Block compressed (GZIP) SequenceFile splittable in MR operation?
Date Mon, 31 Jan 2011 08:36:23 GMT
Hi,

2011/1/31 Sean Bigdatafun <sean.bigdatafun@gmail.com>:
> GZIP is not splittable.

Correct, gzip is a stream compression system which effectively means
you can only start at the beginning of the data with decompressing.

> Does that mean a GZIP block compressed sequencefile can't take advantage of MR parallelism?

AFAIK it should be splittable in the same blocks as the compression was done.

> How to control the size of block to be compressed in SequenceFile?

Can't help you with that one.

-- 
Met vriendelijke groeten,

Niels Basjes

Mime
View raw message