hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raj Hadoop <hadoop...@yahoo.com>
Subject Re: External Partition Table
Date Thu, 31 Oct 2013 22:42:54 GMT
Hi Brad,

Thanks for the quick response.

I have about 10 GB file per day (web logs). And I am creating a folder(partition) per each
day. Is it something uncommon ?

I do not know at this juncture what kind of queries I would be executing upon on this table.
But just wanted to know whether this is something normal or not at all a normal thing.

Thanks,
Raj



On Thursday, October 31, 2013 6:39 PM, Brad Ruderman <bruderman@radiumone.com> wrote:
 
Wow that question won't be answerable. It all depends on the amount of data per partition
and the queries you are going to be executing on it, as well as the structure of the data.
In general in hive (depending on your cluster size) you need to balance the number of files
with the size, smaller number of files is typically preferred but partitions will help when
date restricting.

Thx,
Brad



On Thu, Oct 31, 2013 at 3:34 PM, Raj Hadoop <hadoopraj@yahoo.com> wrote:

Hi,
>
>
>I am planning for a Hive External Partition Table based on a date.
>
>
>Which one of the below yields a better performance or both have the same performance?
>
>
>1) Partition based on one folder per day
>LIKE date INT
>2) Partition based on one folder per year / month / day ( So it has three folders) 
>LIKE year INT, month INT, day INT
>
>
>Thanks,
>Raj
>
>
Mime
View raw message