hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Techy Teck <>
Subject Efficiently Store data in Hive
Date Wed, 01 Aug 2012 18:48:03 GMT
How can I efficiently store data in Hive and also store and retrieve
compressed data in hive?

Currently I am storing it as a TextFile.

I was going through Bejoy article (
and I found that LZO compression will be good for storing the files and
also it is splittable.

I have one HiveQL Select query that is generating some output and I am
storing that output somewhere so that one of my Hive table (quality) can
use that data so that I can query that quality.

Below is the quality table in which I am loading the data from the below
SELECT query by making the partition I am using to overwrite table quality.

*create table quality*

*(id bigint,*

*  total bigint,*

*  error bigint*

* )*

* partitioned by (ds string)*

*row format delimited fields terminated by '\t'*

*stored as textfile*

*location '/user/uname/quality'*


* *

*insert overwrite table quality partition (ds='20120709')*

*SELECT id  , count2 , coalesce(error, cast(0 AS BIGINT)) AS count1  FROM

So here currently I am storing it as a TextFile, should I make this as a
Sequence file and start storing the data in LZO compression format? Or text
file will be fine here also? As from the select query I will be getting
some GB of data, that need to be uploaded on table quality on a daily basis.

So which way is best? Should I store the output as a TextFile or
SequenceFile format (LZO compression) so that when I am query the Hive
quality table, querying is faster.

View raw message