drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3803) Support inequality filter evaluation as part of join operators
Date Fri, 18 Sep 2015 17:58:04 GMT
Aman Sinha created DRILL-3803:

             Summary: Support inequality filter evaluation as part of join operators
                 Key: DRILL-3803
                 URL: https://issues.apache.org/jira/browse/DRILL-3803
             Project: Apache Drill
          Issue Type: Improvement
          Components: Execution - Relational Operators
            Reporter: Aman Sinha
            Assignee: Aman Sinha

Currently Drill evaluates an inequality filter after the join filter.  See below: 
0: jdbc:drill:zk=local> explain plan for select n1.n_name from cp.`tpch/nation.parquet`
n1 inner join cp.`tpch/region.parquet` n2 on n1.n_nationkey = n2.n_nationkey and n1.n_regionkey
< n2.n_regionkey;
| text | json |
| 00-00    Screen
00-01      Project(n_name=[$2])
00-02        SelectionVectorRemover
00-03          Filter(condition=[<($1, $4)])
00-04            HashJoin(condition=[=($0, $3)], joinType=[inner])
00-06              Project(n_nationkey=[$2], n_regionkey=[$0], n_name=[$1])
00-08                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]],
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, columns=[`n_nationkey`, `n_regionkey`,
00-05              Project(n_nationkey0=[$0], n_regionkey0=[$1])
00-07                Project(n_nationkey=[$1], n_regionkey=[$0])
00-09                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]],
selectionRoot=classpath:/tpch/region.parquet, numFiles=1, columns=[`n_nationkey`, `n_regionkey`]]])

Suppose the inequality filter is highly selective but the join's output cardinality is large.
 It would be substantially better to push this filter into the join and evaluate both equality
and inequality as part of the join.  

This is an enhancement.  We may decide at a later time to split this into 2 JIRAs : one for
HashJoin and one for MergeJoin. 

This message was sent by Atlassian JIRA

View raw message