hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ning Zhang <nzh...@fb.com>
Subject Re: can I use hive dynamic partition while loading data into tables?
Date Fri, 15 Apr 2011 17:04:18 GMT
You can create an hourly partitioned table say H with the partition columns(dt, country, hour),
then the INSERVER OVERWRITE command will become: ...

INSERT OVERWRITE TABLE H... PARTITION (dt='...', country, hr) ... select ...  country, hour
..

You can also keep the old table page_view_stg's schema unchanged but make it an external table
pointing to H. In this way your old queries on page_view_stg don't need to be changed.

On Apr 15, 2011, at 12:49 AM, Erix Yao wrote:

Oh, I see.

just as the example we have:

FROM page_view_stg pvs
    INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
           SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip,
pvs.country

The dynamic partition we have is on country, and the other partition is dt.

In this implementation, what if I want to import the data into page_view more than 1 time
? Let us say, we import the data hourly, and with current dynamic partition implementation
, the existing country partition will be overwritten!

Is there any other way to avoid this without telling me to import the data once per day?




2011/4/15 Ning Zhang <nzhang@fb.com<mailto:nzhang@fb.com>>
The INSERT OVERWRITE command will not overwrite the whole table. If you specify a partition
in that table, it will only overwrite that partition. If you specify dynamic partitions, it
will only create/overwrite partitions that will be seen from the input query (pvs.country
in the example).


On Apr 15, 2011, at 12:31 AM, Erix Yao wrote:

Does this mean if I want the type field as the partition key , I will have to split the raw
data by myself and load the files into the target table?

I see there's an example in tutorial:

FROM page_view_stg pvs
    INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
           SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip,
pvs.country

but insert must overwrite the table whole table partition.
Can I insert without the overwrite key word?


2011/4/15 Ning Zhang <nzhang@fb.com<mailto:nzhang@fb.com>>
The LOAD DATA command only copy the files to the destination directory. It doesn't read the
records of the input file, so it cannot do partitioning based on record values.

On Apr 14, 2011, at 10:52 PM, Erix Yao wrote:

hi,all
    The dynamic partition function is amazing ,but only works in insert clause. Can I use
it while loading data into table?

    For example: load data  LOAD DATA LOCAL INPATH `/tmp/pv_2008-06-08_us.txt` INTO TABLE
page_view PARTITION(date='2008-06-08', country='US', type);
type is the dynamic partition key in the raw data?

This will be very cool! If is supported, I will not have to category the raw data according
to the type column.



--
haitao.yao@Beijing








--
haitao.yao@Beijing








--
haitao.yao@Beijing






Mime
View raw message