hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1602) List Partitioning
Date Fri, 27 Aug 2010 22:42:53 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903692#action_12903692
] 

Joydeep Sen Sarma commented on HIVE-1602:
-----------------------------------------

yeah. but i have been asking how you are planning to make the grouping of partitioning transparent.
to me that sounds like a very risky and big change and there are no details here.

why would we do this at hive layer given we have HAR already?

i really don't understand why we wouldn't start with hive-1467 and then add HAR as an optimization
to reduce number of files for small partitions. this doesn't address the skew case. it doesn't
address the fact that we still have to partition by dynamic partitioning columns - and that
requires the same partition-only map-reduce operator that 1467 requires. at which point -
we can just do 1467.

what am i missing?

> List Partitioning
> -----------------
>
>                 Key: HIVE-1602
>                 URL: https://issues.apache.org/jira/browse/HIVE-1602
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>            Reporter: Ning Zhang
>
> Dynamic partition inserts create partitions bases on the dynamic partition column values.
Currently it creates one partition for each distinct DP column value. This could result in
skews in the created dynamic partitions in that some partitions are large but there could
be large number of small partitions as well. This results in burdens in HDFS as well as metastore.
A list partitioning scheme that aggregate a number of small partitions into one big one is
more preferable for skewed partitions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message