hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh Nanda <saurabhna...@gmail.com>
Subject Re: SequenceFile compression on Amazon EMR not very good
Date Wed, 03 Feb 2010 08:56:14 GMT
Thanks, Zheng. Will do some more tests and get back.

Saurabh.

On Mon, Feb 1, 2010 at 1:22 PM, Zheng Shao <zshao9@gmail.com> wrote:

> I would first check whether it is really the block compression or
> record compression.
> Also maybe the block size is too small but I am not sure that is
> tunable in SequenceFile or not.
>
> Zheng
>
> On Sun, Jan 31, 2010 at 9:03 PM, Saurabh Nanda <saurabhnanda@gmail.com>
> wrote:
> > Hi,
> >
> > The size of my Gzipped weblog files is about 35MB. However, upon enabling
> > block compression, and inserting the logs into another Hive table
> > (sequencefile), the file size bloats up to about 233MB. I've done similar
> > processing on a local Hadoop/Hive cluster, and while the compressions is
> not
> > as good as gzipping, it still is not this bad. What could be going wrong?
> >
> > I looked at the header of the resulting file and here's what it says:
> >
> >
> SEQ^F"org.apache.hadoop.io.BytesWritable^Yorg.apache.hadoop.io.Text^A^@'org.apache.hadoop.io.compress.GzipCodec
> >
> > Does Amazon Elastic MapReduce behave differently or am I doing something
> > wrong?
> >
> > Saurabh.
> > --
> > http://nandz.blogspot.com
> > http://foodieforlife.blogspot.com
> >
>
>
>
> --
> Yours,
> Zheng
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Mime
View raw message