hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <>
Subject [jira] [Closed] (HIVE-15269) Dynamic Min-Max/BloomFilter runtime-filtering for Tez
Date Wed, 26 Jul 2017 00:03:09 GMT


Owen O'Malley closed HIVE-15269.

> Dynamic Min-Max/BloomFilter runtime-filtering for Tez
> -----------------------------------------------------
>                 Key: HIVE-15269
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Tez
>            Reporter: Jason Dere
>            Assignee: Deepak Jaiswal
>              Labels: TODOC2.2.0
>             Fix For: 2.2.0
>         Attachments: HIVE-15269.10.patch, HIVE-15269.11.patch, HIVE-15269.12.patch, HIVE-15269.13.patch,
HIVE-15269.14.patch, HIVE-15269.15.patch, HIVE-15269.16.patch, HIVE-15269.17.patch, HIVE-15269.18.patch,
HIVE-15269.19.patch, HIVE-15269.1.patch, HIVE-15269.2.patch, HIVE-15269.3.patch, HIVE-15269.4.patch,
HIVE-15269.5.patch, HIVE-15269.6.patch, HIVE-15269.7.patch, HIVE-15269.8.patch, HIVE-15269.9.patch
> If a dimension table and fact table are joined:
> {noformat}
> select *
> from store join store_sales on ( = store_sales.store_id)
> where store.s_store_name = 'My Store'
> {noformat}
> One optimization that can be done is to get the min/max store id values that come out
of the scan/filter of the store table, and send this min/max value (via Tez edge) to the task
which is scanning the store_sales table.
> We can add a BETWEEN(min, max) predicate to the store_sales TableScan, where this predicate
can be pushed down to the storage handler (for example for ORC formats). Pushing a min/max
predicate to the ORC reader would allow us to avoid having to entire whole row groups during
the table scan.

This message was sent by Atlassian JIRA

View raw message