hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1002) multi-partition inserts
Date Wed, 03 Mar 2010 23:10:27 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840940#action_12840940
] 

Ning Zhang commented on HIVE-1002:
----------------------------------

I think it good to let the user specify the partition columns just like it is done currently.
We will allow user to left some partition columns to be dynamic partition columns which means
they don't need to give the value at compile time. Which partition a row is inserted is determined
at runtime. 

However, one issue is that if the order of the partition columns in the DML are different
from the their order in DDL, we should thrown an error if some static partition followed by
a dynamic partition. For example
{code}
insert overwrite table T partition (ds, hr=12) select ...
{code}

should throw an error. The reason is that the order of the partition column determines the
directory hierarchy (hr is a subdirectory of ds). This is determined at create table time.
If we allow the above DML, we have to have a clear semantics:  we should either change all
ds partitions who has a subdirectory hr=12, or we should complete overwrite the table and
use a different directory hierarchy (ds being a subdirectory of hr).  The first solution is
potentially very expensive and rarely seen in practice. The second solution is potentially
dangerous since the user could accidentally entered the wrong order and the whole table got
overwritten rather than some partition got updated. Also the second case has a workaround:
the user could create another partitioned table with different partition column ordering and
use the above DML to load data.  


> multi-partition inserts
> -----------------------
>
>                 Key: HIVE-1002
>                 URL: https://issues.apache.org/jira/browse/HIVE-1002
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Zheng Shao
>            Assignee: Ning Zhang
>
> We should allow queries like this into a partitioned table:
> {code}
> CREATE TABLE (a STRING, b STRING, c STRING)
> PARTITIONED BY (ds STRING, ts STRING);
> INSERT OVERWRITE TABLE x PARTITION (ds = '2009-12-12')
> SELECT a, b, c, ts FROM xxx;
> {code}
> Basically, allowing users to overwrite multiple partitions at a time.
> The partition values specified in PARTITION part (if any) should be a prefix of the partition
keys.
> The rest of the partition keys goes to the end of the SELECT expression list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message