hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yongqiang he <heyongqiang...@gmail.com>
Subject Re: Compression and Indexing
Date Thu, 15 Sep 2011 21:41:30 GMT
>>Question 1:
Indexing should work for both. But i suggest u use block compression.

>>Question 3 (and perhaps, the most important):
block based compression.


On Thu, Sep 15, 2011 at 2:16 PM, Mark Grover <mgrover@oanda.com> wrote:
> Hi all,
> I've a question regarding compression and indexing.
>
> I would like to compress our Hive data (presently present as SequenceFile).
> Also, I have an index on this table and would like to maintain the index as
> well (i.e. keep using it).
>
> Question 1:
> Sequence file compression can be block or record based. For indexing to
> work, do I need to have block based compression? If both block and record
> based compression can work with indexing, can someone provide insight into
> which to use when?
>
> Question 2:
> BZip2 is also a block based compression and is splittable in Hadoop. Do you
> see any issues with storing data in BZip2 files and using indexing on that
> data?
>
> Question 3 (and perhaps, the most important):
> What are the best practices for compression (with or without indexing). Are
> folks typically using Sequence File compression as compared to other
> compressions (like BZip2)? If using Sequence File compression, are folks
> using record based or block based?
>
>
> Thank you in advance!
> Mark
>
> --
> Mark Grover, Business Intelligence Analyst
> OANDA Corporation
>
> www: oanda.com www: fxtrade.com
> e: mgrover@oanda.com
>
> "Best Trading Platform" - World Finance's Forex Awards 2009.
> "The One to Watch" - Treasury Today's Adam Smith Awards 2009.
>
>

Mime
View raw message