hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Re: bz2 Splits.
Date Tue, 28 Jul 2009 15:02:45 GMT
On Tue, Jul 28, 2009 at 2:22 AM, Zheng Shao<zshao9@gmail.com> wrote:
> Yes we do compress all tables.
>
> Zheng
>
> On Mon, Jul 27, 2009 at 11:08 PM, Saurabh Nanda<saurabhnanda@gmail.com> wrote:
>>
>>> In our setup, we didn't change io.seqfile.compress.blocksize (1MB) and
>>> it's still fairly good.
>>> You are free to try 100MB for better compression ratio, but I would
>>> recommend to keep the default setting to minimize the possibilities of
>>> hitting unknown bugs.
>>
>> Makes sense. Better compression brought down a count(1) query from 100+ sec
>> down to 40sec. The ETL phase is now taking 510sec as opposed to 700sec
>> earlier.
>>
>> Do you also compress all tables, not just the raw ones? Would you recommend
>> it?
>>
>> Saurabh.
>> --
>> http://nandz.blogspot.com
>> http://foodieforlife.blogspot.com
>>
>
>
>
> --
> Yours,
> Zheng
>

Saurabh,

That you for the wiki page on this. Keep up the good work and please
post all your findings about compression. Many people (including me)
will benefit  from an explanation about the different types of
compression available and the trade offs of different codecs and
options. I am really excited as I have (shamefully ) had some large
tables with multiple text files building up, and the thought of
smaller data and faster queries is giving me goosebumps.

Edward

Mime
View raw message