hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joydeep Sen Sarma <>
Subject RE: Keeping Data compressed
Date Thu, 19 Mar 2009 03:47:30 GMT
Hey - not sure if anyone responded.

Sequencefiles are the way to go if u want parallelism on the files as well (since gz compressed
files cannot be split). 

One simple way to do this is to start with text files, build (potentially an external) table
on them - and load them into another table that is declared to be stored as a sequencefile.
the load can simply be a 'insert overwrite table XXX select * from YYY' on the first table
(YYY). The first table is just a tmp table used to do the loading.

Whether the data is compressed or not as a result is controlled by the hive option 'hive.exec.compress.output'.
if this is set to true - the codec used is whatever is dictated by hadoop options that control
the codec. The relevant options are:


u want to set them to and BLOCK respectively.

Hope this helps,


-----Original Message-----
From: Bob Schulze [] 
Sent: Wednesday, March 18, 2009 8:07 AM
Subject: Keeping Data compressed


	I want to keep data in hadoop compressed, ready for hive-selects to

Is using sequencefiles with compression the way to go?

How can I get my data into hive tables "as sequencefile", with an
underlaying compression?

Thx for any ideas,


View raw message