hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hamilton, Robert (Austin)" <robert.hamil...@hp.com>
Subject RE: help with compression and index
Date Tue, 21 Feb 2012 23:20:40 GMT
The automatic index handling will be very cool.  I'm testing now 0.81 on our system and will
see how it goes. 
Thanks Mark and Bejoy!


-----Original Message-----
From: Mark Grover [mailto:mgrover@oanda.com] 
Sent: Tuesday, February 21, 2012 4:03 PM
To: user@hive.apache.org
Subject: Re: help with compression and index

Hi Robert,
As per https://issues.apache.org/jira/browse/HIVE-1644, Hive 0.8 introduces automatic accessing
of indexes. That might come in handy too!

Mark

Mark Grover, Business Intelligence Analyst OANDA Corporation 

www: oanda.com www: fxtrade.com
e: mgrover@oanda.com 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 


----- Original Message -----
From: "Bejoy Ks" <bejoy_ks@yahoo.com>
To: user@hive.apache.org
Sent: Tuesday, February 21, 2012 11:47:56 AM
Subject: Re: help with compression and index



Hi Hamilton
When you are doing indexing(generate index files) is compression enabled? If so you are running
into this known issue
https://issues.apache.org/jira/browse/HIVE-2331 


Which is fixed in hive 0.8 . An upgrade should get it rolling for you and is recommended.



Regards
Bejoy.K.S 








From: "Hamilton, Robert (Austin)" <robert.hamilton@hp.com> 
To: "user@hive.apache.org" <user@hive.apache.org> 
Sent: Tuesday, February 21, 2012 8:48 PM 
Subject: help with compression and index 

Hi all. I sent this to common-user@hadoop hoping there was an easy answer but got no response.


I have a couple of users who basically have no use case other than the need to extract specific
rows based on some predetermined set of keys, so I would like to be able to just provide them
with an index and show them how to join to the detail table using the index. So I'm looking
for a reliable compression+index method with hive. To get an idea of the data size my files
add up to about 80TB uncompressed but currently gzipped to only 10 TB - I need to keep it
small (ish) until I can get more disk space, so it has to stay compressed. 

I don't mind recompressing to LZO or bzip but need to prove that it would actually work first
:) 

I've done my testing on LZO and uncompressed test samples. If I use uncompressed files the
indexed select works OK. If I use LZO it returns only a fraction of the rows I expect. I gather
that files compressed with other compression methods cannot be indexed at all with Hive 0.7.1?


I'm following the prescription to select buckets/offets into a temporary file, set hive.index.compact.file
to the temp file, set hive.input.format to HiveCompactIndexInputFormat and run my select.
That doesn't let me do subselects but I don't mind as it is only a very limited use case that
I need to support. 

This is the only method I could find documented on the net. Is there a better way to do this?
I don't mind upgrading Hive (currently on 0.7.1) or Hadoop (currently 0.20.2)? 



Mime
View raw message