hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <>
Subject [jira] [Work logged] (HIVE-20683) Add the Ability to push Dynamic Between and Bloom filters to Druid
Date Mon, 02 Sep 2019 17:05:00 GMT


ASF GitHub Bot logged work on HIVE-20683:

                Author: ASF GitHub Bot
            Created on: 02/Sep/19 17:04
            Start Date: 02/Sep/19 17:04
    Worklog Time Spent: 10m 
      Work Description: nishantmonu51 commented on pull request #723: [HIVE-20683] Add the
Ability to push Dynamic Between and Bloom filters to Druid

 File path: ql/src/test/results/clientpositive/druid/druidmini_expressions.q.out
 @@ -1868,9 +1868,9 @@ POSTHOOK: query: SELECT DATE_ADD(cast(`__time` as date), CAST((cdouble
/ 1000) A
 POSTHOOK: Input: default@druid_table_alltypesorc
 POSTHOOK: Output: hdfs://### HDFS PATH ###
-1969-02-26	1970-11-04
-1969-03-19	1970-10-14
-1969-11-13	1970-02-17
+1969-12-15	1970-01-16
+1969-12-15	1970-01-16
+1969-12-15	1970-01-16
 Review comment:
   checked it out, DATE_ADD(cast(__time as date), CAST((cdouble / 1000) changed as the cdouble
value changed due to change in rollup.
   Added more columns to this query to make things more clear in this PR
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

Issue Time Tracking

    Worklog Id:     (was: 305289)
    Time Spent: 2h 10m  (was: 2h)

> Add the Ability to push Dynamic Between and Bloom filters to Druid
> ------------------------------------------------------------------
>                 Key: HIVE-20683
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Druid integration
>            Reporter: Nishant Bangarwa
>            Assignee: Nishant Bangarwa
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-20683.1.patch, HIVE-20683.2.patch, HIVE-20683.3.patch, HIVE-20683.4.patch,
HIVE-20683.5.patch, HIVE-20683.6.patch, HIVE-20683.8.patch, HIVE-20683.patch
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
> For optimizing joins, Hive generates BETWEEN filter with min-max and BLOOM filter for
filtering one side of semi-join.
> Druid 0.13.0 will have support for Bloom filters (Added via
> Implementation details - 
> # Hive generates and passes the filters as part of 'filterExpr' in TableScan. 
> # DruidQueryBasedRecordReader gets this filter passed as part of the conf. 
> # During execution phase, before sending the query to druid in DruidQueryBasedRecordReader
we will deserialize this filter, translate it into a DruidDimFilter and add it to existing
DruidQuery.  Tez executor already ensures that when we start reading results from the record
reader, all the dynamic values are initialized. 
> # Explaining a druid query also prints the query sent to druid as {{druid.json.query}}.
We also need to make sure to update the druid query with the filters. During explain we do
not have the actual values for the dynamic values, so instead of values we will print the
dynamic expression itself as part of druid query. 
> Note:- This work needs druid to be updated to version 0.13.0

This message was sent by Atlassian Jira

View raw message