hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <>
Subject [jira] [Created] (HIVE-15269) Dynamic Min-Max runtime-filtering for Tez
Date Wed, 23 Nov 2016 07:08:58 GMT
Jason Dere created HIVE-15269:

             Summary: Dynamic Min-Max runtime-filtering for Tez
                 Key: HIVE-15269
             Project: Hive
          Issue Type: Bug
            Reporter: Jason Dere
            Assignee: Deepak Jaiswal

If a dimension table and fact table are joined:
select *
from store join store_sales on ( = store_sales.store_id)
where store.s_store_name = 'My Store'

One optimization that can be done is to get the min/max store id values that come out of the
scan/filter of the store table, and send this min/max value (via Tez edge) to the task which
is scanning the store_sales table.
We can add a BETWEEN(min, max) predicate to the store_sales TableScan, where this predicate
can be pushed down to the storage handler (for example for ORC formats). Pushing a min/max
predicate to the ORC reader would allow us to avoid having to entire whole row groups during
the table scan.

This message was sent by Atlassian JIRA

View raw message