hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deepak Jaiswal (JIRA)" <>
Subject [jira] [Updated] (HIVE-15269) Dynamic Min-Max runtime-filtering for Tez
Date Sun, 22 Jan 2017 08:38:26 GMT


Deepak Jaiswal updated HIVE-15269:
    Attachment: HIVE-15269.17.patch

Updated the patch to work with latest builds. No new test failures.

> Dynamic Min-Max runtime-filtering for Tez
> -----------------------------------------
>                 Key: HIVE-15269
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Jason Dere
>            Assignee: Deepak Jaiswal
>         Attachments: HIVE-15269.10.patch, HIVE-15269.11.patch, HIVE-15269.12.patch, HIVE-15269.13.patch,
HIVE-15269.14.patch, HIVE-15269.15.patch, HIVE-15269.16.patch, HIVE-15269.17.patch, HIVE-15269.1.patch,
HIVE-15269.2.patch, HIVE-15269.3.patch, HIVE-15269.4.patch, HIVE-15269.5.patch, HIVE-15269.6.patch,
HIVE-15269.7.patch, HIVE-15269.8.patch, HIVE-15269.9.patch
> If a dimension table and fact table are joined:
> {noformat}
> select *
> from store join store_sales on ( = store_sales.store_id)
> where store.s_store_name = 'My Store'
> {noformat}
> One optimization that can be done is to get the min/max store id values that come out
of the scan/filter of the store table, and send this min/max value (via Tez edge) to the task
which is scanning the store_sales table.
> We can add a BETWEEN(min, max) predicate to the store_sales TableScan, where this predicate
can be pushed down to the storage handler (for example for ORC formats). Pushing a min/max
predicate to the ORC reader would allow us to avoid having to entire whole row groups during
the table scan.

This message was sent by Atlassian JIRA

View raw message