hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: how to load data to partitioned table
Date Sun, 14 Aug 2011 16:15:03 GMT
Ya I very much agree with you on those lines. Using the basic stuff would literally run into
memory issues  with large datasets. I had some of those resolved by using the DISTRIBUTE BY
clause and so. In short a little work around over your hive queries could help you out in
some cases.
Bejoy K S

-----Original Message-----
From: hadoopman <>
Date: Sun, 14 Aug 2011 08:57:12 
To: <>
Subject: Re: how to load data to partitioned table

Something else I've noticed is when loading LOTS of historical data, if 
you can try to say load a month of data at a time, try to just load THAT 
month of data and only that month.  I've been able to load several years 
of data (depending on the data) at a single load however there have been 
times when loading a large dataset that I would run into memory issues 
during the reduce phase (usually during shuffle/sort).  Things from out 
of memory to stack overflow messages (I've compiled a list of the more 
fun ones).

Then I noticed that only loading data from say a single month loaded 
quickly and without the memory headaches during the reduce.

Something to keep in mind and it works great!

On 08/12/2011 07:58 AM, wrote:
> Hi Daniel
> Just having a look at your requirement , to load data into a partition 
> based hive table from any input file the most hassle free approach 
> would be.
> 1. Load the data into a non partitioned table that shares similar 
> structure as the target table.
> 2. Populate the target table with the data from non partitioned one 
> using hive dynamic partition
> approach.
> With Dynamic partitions you don't need to manually identify the data 
> partitions and distribute data accordingly.
> A similar implementation is described in the blog post
> Hope it helps
> Regards
> Bejoy K S
> ------------------------------------------------------------------------
> *From: * Vikas Srivastava <>
> *Date: *Fri, 12 Aug 2011 17:31:28 +0530
> *To: *<>
> *ReplyTo: *
> *Subject: *Re: how to load data to partitioned table
> Hey ,
> Simpley you have run query like this
> FROM sales_temp INSERT OVERWRITE TABLE sales partition(period_key) 
> Regards
> Vikas Srivastava
> 2011/8/12 Daniel,Wu < <>>
>       suppose the table is partitioned by period_key, and the csv file
>     also has a column named as period_key. The csv file contains
>     multiple days of data, how can we load it in the the table?
>     I think of an workaround by first load the data into a
>     non-partition table, and then insert the data from non-partition
>     table to the partition table.
>     hive> INSERT OVERWRITE TABLE sales SELECT * FROM sales_temp;
>     FAILED: Error in semantic analysis: need to specify partition
>     columns because the destination table is partitioned.
>     However it doesn't work also. please help.
> -- 
> With Regards
> Vikas Srivastava
> DWH & Analytics Team
> Mob:+91 9560885900
> One97 | Let's get talking !

View raw message