hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Sudarshana <sachin.had...@gmail.com>
Subject Sequence file compression in Hive
Date Mon, 10 Jun 2013 07:48:16 GMT
Hi,

I have a table stored as SEQUENCEFILE in hive-0.10,* facts520_normal_seq*

Now, i wish to create another table stored as a SEQUENCEFILE itself, but
compressed using the Gzip codec.

So, i set the compression codec and type as BLOCK and then executed the
following query:

*SET hive.exec.compress.output=true;*
*SET
mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;*
*SET mapred.output.compression.type=BLOCK;*

*create table test1facts520_gzip_seq as select * from facts520_normal_seq;*
*
*
The table got created and was compressed as well.

*[root@aana1 comp_data]# sudo -u hdfs hadoop fs -ls
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq*
*Found 5 items*
*-rw-r--r--   3 admin supergroup   38099145 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000000_0.gz*
*-rw-r--r--   3 admin supergroup   31450189 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000001_0.gz*
*-rw-r--r--   3 admin supergroup   20764259 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000002_0.gz*
*-rw-r--r--   3 admin supergroup   21107597 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000003_0.gz*
*-rw-r--r--   3 admin supergroup   12202692 2013-06-10 17:56
/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq/000004_0.gz*
*
*
However, when i checked the table properties, it was surprising to see that
the table has been stored as a textfile!

*hive> show create table test1facts520_gzip_seq;*
*OK*
*CREATE  TABLE test1facts520_gzip_seq(*
*  fact_key bigint,*
*  products_key int,*
*  retailers_key int,*
*  suppliers_key int,*
*  time_key int,*
*  units int)*
*ROW FORMAT SERDE*
*  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'*
*STORED AS INPUTFORMAT*
*  'org.apache.hadoop.mapred.TextInputFormat'*
*OUTPUTFORMAT*
*  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'*
*LOCATION*
*  'hdfs://
aana1.ird.com/user/hive/warehouse/facts_520.db/test1facts520_gzip_seq'*
*TBLPROPERTIES (*
*  'numPartitions'='0',*
*  'numFiles'='5',*
*  'transient_lastDdlTime'='1370867198',*
*  'numRows'='0',*
*  'totalSize'='123623882',*
*  'rawDataSize'='0')*
*Time taken: 0.15 seconds*
*
*
*
*
So, i tried adding the STORED AS clause to my earlier create table
statement and created a new table:

*create table test3facts520_gzip_seq STORED AS SEQUENCEFILE as select *
from facts520_normal_seq;*
*
*
This time, the output table got stored as a SEQUENCEFILE,

*hive> show create table test3facts520_gzip_seq;*
*OK*
*CREATE  TABLE test3facts520_gzip_seq(*
*  fact_key bigint,*
*  products_key int,*
*  retailers_key int,*
*  suppliers_key int,*
*  time_key int,*
*  units int)*
*ROW FORMAT SERDE*
*  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'*
*STORED AS INPUTFORMAT*
*  'org.apache.hadoop.mapred.SequenceFileInputFormat'*
*OUTPUTFORMAT*
*  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'*
*LOCATION*
*  'hdfs://
aana1.ird.com/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq'*
*TBLPROPERTIES (*
*  'numPartitions'='0',*
*  'numFiles'='5',*
*  'transient_lastDdlTime'='1370867777',*
*  'numRows'='0',*
*  'totalSize'='129811519',*
*  'rawDataSize'='0')*
*Time taken: 0.135 seconds*

But, the compression itself did not happen!

*[root@aana1 comp_data]# sudo -u hdfs hadoop fs -ls
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq*
*Found 5 items*
*-rw-r--r--   3 admin supergroup   40006368 2013-06-10 18:06
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000000_0*
*-rw-r--r--   3 admin supergroup   33026961 2013-06-10 18:06
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000001_0*
*-rw-r--r--   3 admin supergroup   21797242 2013-06-10 18:05
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000002_0*
*-rw-r--r--   3 admin supergroup   22171637 2013-06-10 18:05
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000003_0*
*-rw-r--r--   3 admin supergroup   12809311 2013-06-10 18:05
/user/hive/warehouse/facts_520.db/test3facts520_gzip_seq/000004_0*

Is there anything that I have done wrong, or I have missed something ?

Any help would be greatly appreciated!

Thank you,
Sachin

Mime
View raw message