hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raj Hadoop <hadoop...@yahoo.com>
Subject Re: External Partition Table
Date Thu, 31 Oct 2013 22:56:10 GMT


Thanks Tim. I am using a String column for the partition column. 



On Thursday, October 31, 2013 6:49 PM, Timothy Potter <thelabdude@gmail.com> wrote:
 
Hi Raj,
This seems like a matter of style vs. any performance benefit / cost ... if you're going to
do a lot of queries just based on month or year, then #2 might be easier, e.g.

select * from foo where year = 2013 seems a little cleaner than select * from foo where date
>= 20130101 and date <= 20131231 (not sure how you're encoding dates into a INT but
I think you get the idea)

I do something similar but my partition fields are strings, like 2013-10-31_0000 (which has
the nice property of lexically sorting the same as numeric sort).

I'm assuming they will both have the same performance because Hive is still selecting the
same number of input paths in both scenarios, one just happens to be a little deeper.

Cheers,
Tim



On Thu, Oct 31, 2013 at 4:34 PM, Raj Hadoop <hadoopraj@yahoo.com> wrote:

Hi,
>
>
>I am planning for a Hive External Partition Table based on a date.
>
>
>Which one of the below yields a better performance or both have the same performance?
>
>
>1) Partition based on one folder per day
>LIKE date INT
>2) Partition based on one folder per year / month / day ( So it has three folders) 
>LIKE year INT, month INT, day INT
>
>
>Thanks,
>Raj
>
>
Mime
View raw message