drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-485) Support non-equijoins as long as there is at least 1 equijoin condition between the same 2 tables
Date Thu, 12 Jun 2014 01:36:02 GMT

    [ https://issues.apache.org/jira/browse/DRILL-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028700#comment-14028700
] 

Aman Sinha commented on DRILL-485:
----------------------------------

As mentioned in my previous comment, this has been fixed.  I tested it again on latest build
(commit level 27a9c98) with the following queries .. both Explain plans and correctness of
results w.r.t Postgres.   

Query 1: (Note the Filter above the HashJoin.  The Filter has the inequality '<' condition)

0: jdbc:drill:zk=local> explain plan for select cast(c.c_name as varchar(20)) from orders
o, customer c where o.o_custkey = c.c_custkey and o.o_orderkey < c.c_nationkey;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      UnionExchange
01-01        Project(EXPR$0=[CAST($4):VARCHAR(20) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"])
01-02          SelectionVectorRemover
01-03            Filter(condition=[<($1, $3)])
01-04              HashJoin(condition=[=($0, $2)], joinType=[inner])
01-06                Project(o_custkey=[$1], o_orderkey=[$0])
01-07                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/orders]],
selectionRoot=/Users/asinha/data/tpchmulti/orders, columns=[SchemaPath [`o_custkey`], SchemaPath
[`o_orderkey`]]]])
01-05                BroadcastExchange
02-01                  Project(c_custkey=[$2], c_nationkey=[$1], c_name=[$0])
02-02                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/customer]],
selectionRoot=/Users/asinha/data/tpchmulti/customer, columns=[SchemaPath [`c_custkey`], SchemaPath
[`c_nationkey`], SchemaPath [`c_name`]]]])  

0: jdbc:drill:zk=local> select cast(c.c_name as varchar(20)) from orders o, customer c
where o.o_custkey = c.c_custkey and o.o_orderkey < c.c_nationkey;
+------------+
|   EXPR$0   |
+------------+
| Customer#000000370 |
| Customer#000000781 |
| Customer#000001369 |
| Customer#000000445 |
| Customer#000000557 |
| Customer#000000392 |
+------------+
6 rows selected

Query 2:  (Note the Filter above the HashJoin.  The Filter has the OR condition). 

0: jdbc:drill:zk=local> explain plan for select cast(c.c_name as varchar(20)) from orders
o, customer c where o.o_custkey = c.c_custkey and (o.o_orderkey = c.c_nationkey or o.o_custkey
= c.c_nationkey);
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      UnionExchange
01-01        Project(EXPR$0=[CAST($4):VARCHAR(20) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"])
01-02          SelectionVectorRemover
01-03            Filter(condition=[OR(=($1, $3), =($0, $3))])
01-04              HashJoin(condition=[=($0, $2)], joinType=[inner])
01-06                Project(o_custkey=[$1], o_orderkey=[$0])
01-07                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/orders]],
selectionRoot=/Users/asinha/data/tpchmulti/orders, columns=[SchemaPath [`o_custkey`], SchemaPath
[`o_orderkey`]]]])
01-05                BroadcastExchange
02-01                  Project(c_custkey=[$2], c_nationkey=[$1], c_name=[$0])
02-02                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/customer]],
selectionRoot=/Users/asinha/data/tpchmulti/customer, columns=[SchemaPath [`c_custkey`], SchemaPath
[`c_nationkey`], SchemaPath [`c_name`]]]])

0: jdbc:drill:zk=local> select cast(c.c_name as varchar(20)) from orders o, customer c
where o.o_custkey = c.c_custkey and (o.o_orderkey = c.c_nationkey or o.o_custkey = c.c_nationkey);
+------------+
|   EXPR$0   |
+------------+
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
+------------+
31 rows selected



> Support non-equijoins as long as there is at least 1 equijoin condition between the same
2 tables
> -------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-485
>                 URL: https://issues.apache.org/jira/browse/DRILL-485
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>             Fix For: 1.0.0-BETA1
>
>
> Currently, if the query has a non-equijoin between tables t1, t2 we throw an error that
it is unsupported.  However, if there is at least one equijoin between the tables t1, t2 in
addition to the non-equijoin, then we should perform the join and then do a filter on top
using the non-equijoin condition.  
> example queries where this could be applicable: 
>   SELECT a1 FROM t1, t2 WHERE b1 = b2 AND c1 < c2;
>   SELECT a1 FROM t1, t2 WHERE b1 = b2 AND (c1 = c2 OR d1 = d2);
>   



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message