hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gunther Hagleitner (JIRA)" <>
Subject [jira] [Updated] (HIVE-7826) Dynamic partition pruning on Tez
Date Mon, 25 Aug 2014 08:02:09 GMT


Gunther Hagleitner updated HIVE-7826:

    Attachment: HIVE-7826.3.patch

.3 has various fixes. Should be good to go now.

> Dynamic partition pruning on Tez
> --------------------------------
>                 Key: HIVE-7826
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gunther Hagleitner
>            Assignee: Gunther Hagleitner
>              Labels: tez
>         Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch
> It's natural in a star schema to map one or more dimensions to partition columns. Time
or location are likely candidates. 
> It can also useful to be to compute the partitions one would like to scan via a subquery
(where p in select ... from ...).
> The resulting joins in hive require a full table scan of the large table though, because
partition pruning takes place before the corresponding values are known.
> On Tez it's relatively straight forward to send the values needed to prune to the application
master - where splits are generated and tasks are submitted. Using these values we can strip
out any unneeded partitions dynamically, while the query is running.
> The approach is straight forward:
> - Insert synthetic conditions for each join representing "x in (keys of other side in
> - This conditions will be pushed as far down as possible
> - If the condition hits a table scan and the column involved is a partition column:
>    - Setup Operator to send key events to AM
> - else:
>    - Remove synthetic predicate

This message was sent by Atlassian JIRA

View raw message