hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lefty Leverenz (JIRA)" <>
Subject [jira] [Commented] (HIVE-15269) Dynamic Min-Max/BloomFilter runtime-filtering for Tez
Date Wed, 29 Mar 2017 07:53:42 GMT


Lefty Leverenz commented on HIVE-15269:

Thanks for noticing the configs, [~cartershanklin].  This ticket doesn't have to be reopened,
we just add the appropriate TODOC label.  In this case, it's TODOC2.2.0 (a new label because
TODOC2.2 has become ambiguous) -- I checked in Owen's branch-2.2 and this patch's
configs are there.

This patch adds two configs:

* *hive.tez.dynamic.semijoin.reduction*
* *hive.tez.max.bloom.filter.entries*

The other two configs mentioned above are from other tickets, which already have TODOC labels
but aren't in Owen's branch-2.2:

* *hive.tez.dynamic.semijoin.reduction.threshold* -- created by HIVE-16154 in 2.3.0
* *hive.tez.bigtable.minsize.semijoin.reduction* -- created by HIVE-16260 in 2.3.0

They all belong in the Tez section of Configuration Properties:

* [Configuration Properties -- Tez |]

Adding a TODOC2.2.0 label.

> Dynamic Min-Max/BloomFilter runtime-filtering for Tez
> -----------------------------------------------------
>                 Key: HIVE-15269
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Tez
>            Reporter: Jason Dere
>            Assignee: Deepak Jaiswal
>             Fix For: 2.2.0
>         Attachments: HIVE-15269.10.patch, HIVE-15269.11.patch, HIVE-15269.12.patch, HIVE-15269.13.patch,
HIVE-15269.14.patch, HIVE-15269.15.patch, HIVE-15269.16.patch, HIVE-15269.17.patch, HIVE-15269.18.patch,
HIVE-15269.19.patch, HIVE-15269.1.patch, HIVE-15269.2.patch, HIVE-15269.3.patch, HIVE-15269.4.patch,
HIVE-15269.5.patch, HIVE-15269.6.patch, HIVE-15269.7.patch, HIVE-15269.8.patch, HIVE-15269.9.patch
> If a dimension table and fact table are joined:
> {noformat}
> select *
> from store join store_sales on ( = store_sales.store_id)
> where store.s_store_name = 'My Store'
> {noformat}
> One optimization that can be done is to get the min/max store id values that come out
of the scan/filter of the store table, and send this min/max value (via Tez edge) to the task
which is scanning the store_sales table.
> We can add a BETWEEN(min, max) predicate to the store_sales TableScan, where this predicate
can be pushed down to the storage handler (for example for ORC formats). Pushing a min/max
predicate to the ORC reader would allow us to avoid having to entire whole row groups during
the table scan.

This message was sent by Atlassian JIRA

View raw message