hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ke Jia (JIRA)" <>
Subject [jira] [Commented] (HIVE-15269) Dynamic Min-Max/BloomFilter runtime-filtering for Tez
Date Tue, 23 Jan 2018 06:52:02 GMT


Ke Jia commented on HIVE-15269:

[~djaiswal] thanks for you reply. 

> The 2nd GBY–>RS is executed in a Reducer vertex where it aggregates all the min-max
and bloom filters.

Here, when aggregating the min-max and bloom filters, whether it calculates the final min-max
and bloom filters or it only combine all the min-max and bloom filters? If yes, why calculate
the final min-max and bloom filters again in [|]

And I wonder what is the key in two "GBY" operation, because it not show the keys in the explain
plan. Thanks for your help!

> Dynamic Min-Max/BloomFilter runtime-filtering for Tez
> -----------------------------------------------------
>                 Key: HIVE-15269
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Tez
>            Reporter: Jason Dere
>            Assignee: Deepak Jaiswal
>            Priority: Major
>              Labels: TODOC2.2.0
>             Fix For: 2.2.0
>         Attachments: HIVE-15269.1.patch, HIVE-15269.10.patch, HIVE-15269.11.patch, HIVE-15269.12.patch,
HIVE-15269.13.patch, HIVE-15269.14.patch, HIVE-15269.15.patch, HIVE-15269.16.patch, HIVE-15269.17.patch,
HIVE-15269.18.patch, HIVE-15269.19.patch, HIVE-15269.2.patch, HIVE-15269.3.patch, HIVE-15269.4.patch,
HIVE-15269.5.patch, HIVE-15269.6.patch, HIVE-15269.7.patch, HIVE-15269.8.patch, HIVE-15269.9.patch
> If a dimension table and fact table are joined:
> {noformat}
> select *
> from store join store_sales on ( = store_sales.store_id)
> where store.s_store_name = 'My Store'
> {noformat}
> One optimization that can be done is to get the min/max store id values that come out
of the scan/filter of the store table, and send this min/max value (via Tez edge) to the task
which is scanning the store_sales table.
> We can add a BETWEEN(min, max) predicate to the store_sales TableScan, where this predicate
can be pushed down to the storage handler (for example for ORC formats). Pushing a min/max
predicate to the ORC reader would allow us to avoid having to entire whole row groups during
the table scan.

This message was sent by Atlassian JIRA

View raw message