drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gautam Kumar Parai (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-4771) Drill should avoid doing the same join twice if count(distinct) exists
Date Fri, 08 Jul 2016 21:10:11 GMT
Gautam Kumar Parai created DRILL-4771:
-----------------------------------------

             Summary: Drill should avoid doing the same join twice if count(distinct) exists
                 Key: DRILL-4771
                 URL: https://issues.apache.org/jira/browse/DRILL-4771
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.6.0
            Reporter: Gautam Kumar Parai
            Assignee: Gautam Kumar Parai


When the query has one distinct aggregate and one or more non-distinct aggregates, the join
instance need not produce the join-based plan. We can generate multi-phase aggregates. Another
approach would be to use grouping sets. However, Drill is unable to support grouping sets
and instead relies on the join-based plan (see the plan below)

{code}
select emp.empno, count(*), avg(distinct dept.deptno) 
from sales.emp emp inner join sales.dept dept 
on emp.deptno = dept.deptno 
group by emp.empno

LogicalProject(EMPNO=[$0], EXPR$1=[$1], EXPR$2=[$3])
  LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner])
    LogicalAggregate(group=[{0}], EXPR$1=[COUNT()])
      LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
        LogicalJoin(condition=[=($7, $9)], joinType=[inner])
          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
          LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
    LogicalAggregate(group=[{0}], EXPR$2=[AVG($1)])
      LogicalAggregate(group=[{0, 1}])
        LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
          LogicalJoin(condition=[=($7, $9)], joinType=[inner])
            LogicalTableScan(table=[[CATALOG, SALES, EMP]])
            LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{code}

The more efficient form should look like this

{code}

select emp.empno, count(*), avg(distinct dept.deptno) 
from sales.emp emp inner join sales.dept dept 
on emp.deptno = dept.deptno 
group by emp.empno

LogicalAggregate(group=[{0}], EXPR$1=[SUM($2)], EXPR$2=[AVG($1)])
  LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT()])
    LogicalProject(EMPNO=[$0], DEPTNO0=[$9])
      LogicalJoin(condition=[=($7, $9)], joinType=[inner])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
        LogicalTableScan(table=[[CATALOG, SALES, DEPT]])

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message