hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Grover <mgro...@oanda.com>
Subject Re: Compression and Indexing
Date Fri, 16 Sep 2011 14:06:53 GMT
Thanks, Yongqiang!

Could you please confirm my understanding of how to use block compression?

As of now, I am setting these properties before populating the table 
that should contain compressed data:
SET io.seqfile.compression.type=BLOCK;
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;

Question 1:
Do I need to set io.seqfile.compress.blocksize? If so, to what? It's set 
to 1000000 by default

Question 2:
Do I need to set hive.merge.mapfiles? If so, to what? It's set to true 
by default.

Question 3:
Any other options I need to set up?

Thanks again!

Mark
P.S: I am using Hive 0.7.1 with Hadoop 0.20

On 11-09-15 05:41 PM, yongqiang he wrote:
>>> Question 1:
> Indexing should work for both. But i suggest u use block compression.
>
>>> Question 3 (and perhaps, the most important):
> block based compression.
>
>
> On Thu, Sep 15, 2011 at 2:16 PM, Mark Grover<mgrover@oanda.com>  wrote:
>> Hi all,
>> I've a question regarding compression and indexing.
>>
>> I would like to compress our Hive data (presently present as SequenceFile).
>> Also, I have an index on this table and would like to maintain the index as
>> well (i.e. keep using it).
>>
>> Question 1:
>> Sequence file compression can be block or record based. For indexing to
>> work, do I need to have block based compression? If both block and record
>> based compression can work with indexing, can someone provide insight into
>> which to use when?
>>
>> Question 2:
>> BZip2 is also a block based compression and is splittable in Hadoop. Do you
>> see any issues with storing data in BZip2 files and using indexing on that
>> data?
>>
>> Question 3 (and perhaps, the most important):
>> What are the best practices for compression (with or without indexing). Are
>> folks typically using Sequence File compression as compared to other
>> compressions (like BZip2)? If using Sequence File compression, are folks
>> using record based or block based?
>>
>>
>> Thank you in advance!
>> Mark
>>
>> --
>> Mark Grover, Business Intelligence Analyst
>> OANDA Corporation
>>
>> www: oanda.com www: fxtrade.com
>> e: mgrover@oanda.com
>>
>> "Best Trading Platform" - World Finance's Forex Awards 2009.
>> "The One to Watch" - Treasury Today's Adam Smith Awards 2009.
>>
>>

-- 
Mark Grover, Business Intelligence Analyst
OANDA Corporation

www: oanda.com www: fxtrade.com
e: mgrover@oanda.com

"Best Trading Platform" - World Finance's Forex Awards 2009.
"The One to Watch" - Treasury Today's Adam Smith Awards 2009.


Mime
View raw message