drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-885) Handle project pushdown for constant expressions
Date Mon, 02 Jun 2014 17:56:01 GMT
Aman Sinha created DRILL-885:

             Summary: Handle project pushdown for constant expressions
                 Key: DRILL-885
                 URL: https://issues.apache.org/jira/browse/DRILL-885
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Aman Sinha

In the following query, notice in the Explain plan that the node  Project($f0=[0])  is projecting
a constant, so ideally we should not have to produce a whole bunch of columns from either
side of the join unless those columns are needed for the join condition.  However, currently
we do produce those unnecessary columns from the Scan below (see the Customer parquet scan
on the left side of the HashJoin).   This hurts performance.

0: jdbc:drill:zk=local> explain plan for select count(*) from (select c.c_custkey, c.c_name,
c.c_address, c.c_nationkey,  c.c_phone, c.c_acctbal, c.c_mktsegment, c.c_comment, n.n_nationkey,
n.n_name, n.n_nationkey, n.n_comment from cp.`tpch/customer.parquet` c JOIN cp.`tpch/nation.parquet`
n ON (c.c_nationkey = n.n_nationkey));
|    text    |    json    |
| 00-00    Screen
00-01      StreamAgg(group=[{}], EXPR$0=[SUM($0)])
00-02        UnionExchange
01-01          StreamAgg(group=[{}], EXPR$0=[COUNT()])
01-02            Project($f0=[0])
01-03              HashJoin(condition=[=($1, $10)], joinType=[inner])
01-05                HashToRandomExchange(dist0=[[$1]])
02-01                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/customer.parquet]],
selectionRoot=/tpch/customer.parquet, columns=[SchemaPath [`c_nationkey`], SchemaPath [`c_custkey`],
SchemaPath [`c_name`], SchemaPath [`c_address`], SchemaPath [`c_phone`], SchemaPath [`c_acctbal`],
SchemaPath [`c_mktsegment`], SchemaPath [`c_comment`]]]])
01-04                Project(*0=[$0], n_nationkey=[$1], n_name=[$2], n_comment=[$3])
01-06                  HashToRandomExchange(dist0=[[$1]])
03-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]],
selectionRoot=/tpch/nation.parquet, columns=[SchemaPath [`n_nationkey`], SchemaPath [`n_name`],
SchemaPath [`n_comment`]]]])

Here's the Drill Logical plan for the same query:
| DrillScreenRel
  DrillAggregateRel(group=[{}], EXPR$0=[COUNT()])
      DrillJoinRel(condition=[=($1, $10)], joinType=[inner])
        DrillScanRel(table=[[cp, tpch/customer.parquet]])
        DrillScanRel(table=[[cp, tpch/nation.parquet]])

This message was sent by Atlassian JIRA

View raw message