hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Yang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1660) Change get_partitions_ps to pass partition filter to database
Date Wed, 13 Oct 2010 22:56:35 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920797#action_12920797
] 

Paul Yang commented on HIVE-1660:
---------------------------------

HIVE-1660.1.patch is the main patch - it create a listPartitionNamesByFilter() method and
fixes get_partitions_ps() and get_partition_names_ps() to use the new filter API's. In addition,
the patch makes an optimization to use a partition name regex for filtering in cases of equality
comparisons.

HIVE-1660_regex.patch was a little experiment to test out the potential speed up from filtering
based on a more complete regex of the partition name. For example, for a table partitioned
on ds and hr, this patch uses a regex like 'ds=2010-10-01/hr=.*' to find all partitions with
a ds='2010-10-01'. For a table with ~5 million partitions and ~15K partitions a day, getting
the partitions for a single day took ~1s with this regex patch vs ~10s for the filter patch.
Since the table with 5 million partitions was a very unusual case, I didn't think the speedup
was worth the additional complexity.

> Change get_partitions_ps to pass partition filter to database
> -------------------------------------------------------------
>
>                 Key: HIVE-1660
>                 URL: https://issues.apache.org/jira/browse/HIVE-1660
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Ajay Kidave
>            Assignee: Paul Yang
>         Attachments: HIVE-1660.1.patch, HIVE-1660_regex.patch
>
>
> Support for doing partition pruning by passing the partition filter to the database is
added by HIVE-1609. Changing get_partitions_ps to use this could result in performance improvement
 for tables having large number of partitions. A listPartitionNamesByFilter API might be required
for implementing this for use from Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message