drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victoria Markman (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-2454) Support for local join filter in outer joins
Date Fri, 13 Mar 2015 22:13:38 GMT
Victoria Markman created DRILL-2454:
---------------------------------------

             Summary: Support for local join filter in outer joins
                 Key: DRILL-2454
                 URL: https://issues.apache.org/jira/browse/DRILL-2454
             Project: Apache Drill
          Issue Type: Improvement
          Components: Execution - Relational Operators
            Reporter: Victoria Markman
            Assignee: Chris Westin


Currently in the case of outer join, we can't have local join filter. "Can not plan" exception
will be thrown.

Below you will find examples of queries that are going to fail:

-- Join filter on the left side of the left outer join 
{code:sql}
select
        count(*)
from
        cp.`tpch/orders.parquet` o
                left outer join
        cp.`tpch/lineitem.parquet` l
on      (o.o_orderkey = l.l_orderkey and o.o_orderkey = 1);
{code}

-- Join filter on the right side of the right outer join
{code:sql}
select
        count(*)
from
        cp.`tpch/lineitem.parquet` l
                right outer join
        cp.`tpch/orders.parquet` o
on      (o.o_orderkey = l.l_orderkey and o.o_orderkey >= 1)
where
        o.o_orderkey = 1;
{code}

Full outer join with the join filter on the left side:
{code:sql}
select
        count(*)
from
        cp.`tpch/orders.parquet` o
                full outer join
        cp.`tpch/lineitem.parquet` l

on      (o.o_orderkey = l.l_orderkey and o.o_orderkey >= 1)
;
{code}

Full outer join with the join filter on the right side:
{code:sql}
select
        count(*)
from
        cp.`tpch/orders.parquet` o
                full outer join
        cp.`tpch/lineitem.parquet` l
on      (o.o_orderkey = l.l_orderkey and l.l_orderkey >= 1)
;
{code}

The only case that will currently work is an inner join:
{code:sql}
select
        count(*)
from
        cp.`tpch/orders.parquet` o
                inner join
        cp.`tpch/lineitem.parquet` l

on      (o.o_orderkey = l.l_orderkey and o.o_orderkey >= 1)
;
{code}

As a reminder, queries with filter in the ON clause of the join condition are semantically
different from those with the same condition in the WHERE clause.

This query will return an order and its lines, if any, for order number =100:
{code:sql}
select  *
from
        cp.`tpch/orders.parquet` o
                left outer join
        cp.`tpch/lineitem.parquet` l

on      (o.o_orderkey = l.l_orderkey)
where   o.o_orderkey = 100
;
{code}

This query will return ALL orders, but only order 100 will have any lines associated with
it:
{code:sql}
select  *
from
        cp.`tpch/orders.parquet` o
                left outer join
        cp.`tpch/lineitem.parquet` l
on      (o.o_orderkey = l.l_orderkey and o.o_orderkey = 100)
;
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message