hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1602) List Partitioning
Date Fri, 27 Aug 2010 20:40:56 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903608#action_12903608
] 

Namit Jain commented on HIVE-1602:
----------------------------------

To clarify more, this is not list partitioning in the traditional database sense.

For a table T with the following columns: c1, c2 and partitioning column p:

The user should be able to specify:

For 

p = p1, partition name = p1
p = p2, partition name = p2
p = p3, partition name = p3
p = p4,p5,p6,p7 partition name = p4_p7
p = p8,p9,..,p100 partition name = p8_p100


But, during a query, the actual values of p must be returned.
(for eg: p4, and not p4_p7)


> List Partitioning
> -----------------
>
>                 Key: HIVE-1602
>                 URL: https://issues.apache.org/jira/browse/HIVE-1602
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>            Reporter: Ning Zhang
>
> Dynamic partition inserts create partitions bases on the dynamic partition column values.
Currently it creates one partition for each distinct DP column value. This could result in
skews in the created dynamic partitions in that some partitions are large but there could
be large number of small partitions as well. This results in burdens in HDFS as well as metastore.
A list partitioning scheme that aggregate a number of small partitions into one big one is
more preferable for skewed partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message