hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Brugidou <>
Subject Re: Hive Dynamic Partions - How to avoid overwrite
Date Tue, 04 Oct 2011 18:20:15 GMT
i suspect you can't do that unless you use 0.8
from the wiki:

"INSERT INTO will append to the table or partition keeping the existing data
in tact. (Note: INSERT INTO syntax is only available starting in version

if you don't have 0.8 then I suggest that you partition simply by day in
addition to Country so that you don't overwrite previous days. Use INSERT
OVERWRITE as usual.


On Tue, Oct 4, 2011 at 8:14 PM, Bejoy Ks <> wrote:

> Thanks Florin for your response.
> But in the suggested approach, I'd have a concern. my partitioned table in
> course of time would hols 100ds of Terabytes of data. So every time when I'm
> loading my data from staging table into the production partitioned table and
> UNION over the same would be way too expensive.
> Is there any other workaround you feel would be suitable in my case.
> Thanks and Regards
> Bejoy.K.S
> ------------------------------
> *From:* Florin Diaconeasa <>
> *To:*; Bejoy Ks <>
> *Sent:* Tuesday, October 4, 2011 2:46 AM
> *Subject:* Re: Hive Dynamic Partions - How to avoid overwrite
> I would recommend doing the following SELECT:
> (
>  x,y,z
> FROM *<input_table>*
> *
> *
> FROM *<target_table>*
> *
> *
> ) allTables;
> Obviously, there are rules coming with UNION ALL, such as you need to
> name(user alias eventually) all the columns of each select. More on this on
> the hive wiki.
> Florin
> On Oct 3, 2011, at 5:02 PM, Bejoy Ks wrote:
> Hi Experts
>     I'm intending to use hive dynamic partition approach on my current
> business use case. What I have in mind for the design is as follows.
> -Load my incoming data into a non partitioned hive table (Table 1)
> -Load this data into partitioned hive table using Dynamic Partitions(Table
> 2)
> -Flush the data in Table1(Drop Table and Recreate the same)
> With this series of steps my data world be ready for mining.
>     This is going to a periodic process happening daily. When I searched
> around I came across a concern with this approach, 'the partitions getting
> overwritten'.
> For example. Say my second table is partitioned based on Country and in my
> first load, data is populated in the partition with country=USA. When the
> second time my Dynamic Partition load/insert it is executed and the source
> data again contains value with country=USA, in that case the data that is
> already there in the partition be overwritten with the new ones.
> Is my understanding right on this scenario? Also in such scenarios what
> would be recommended approach to overcome this hurdle. Basically I want the
> existing data in the partition to be preserved while new data is added on
> to. I can't go ahead with the static partition approach because my data is
> huge and the number of partitions are also petty large.  Has some one framed
> effective solutions on such scenarios with Dynamic Partition insert
> approach? Can some one guide me with a suitable approach with hive for such
> use cases?
> Thanks and Regards
> Bejoy.K.S

View raw message