drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-885) Handle project pushdown for constant expressions
Date Thu, 03 Jul 2014 17:51:38 GMT

     [ https://issues.apache.org/jira/browse/DRILL-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jacques Nadeau updated DRILL-885:
---------------------------------

    Fix Version/s:     (was: 1.0.0-BETA1)
                   Future

> Handle project pushdown for constant expressions
> ------------------------------------------------
>
>                 Key: DRILL-885
>                 URL: https://issues.apache.org/jira/browse/DRILL-885
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Aman Sinha
>            Assignee: Jinfeng Ni
>            Priority: Minor
>             Fix For: Future
>
>
> In the following query, notice in the Explain plan that the node  Project($f0=[0])  is
projecting a constant, so ideally we should not have to produce a whole bunch of columns from
either side of the join unless those columns are needed for the join condition.  However,
currently we do produce those unnecessary columns from the Scan below (see the Customer parquet
scan on the left side of the HashJoin).   This hurts performance.
> 0: jdbc:drill:zk=local> explain plan for select count(*) from (select c.c_custkey,
c.c_name, c.c_address, c.c_nationkey,  c.c_phone, c.c_acctbal, c.c_mktsegment, c.c_comment,
n.n_nationkey, n.n_name, n.n_nationkey, n.n_comment from cp.`tpch/customer.parquet` c JOIN
cp.`tpch/nation.parquet` n ON (c.c_nationkey = n.n_nationkey));
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      StreamAgg(group=[{}], EXPR$0=[SUM($0)])
> 00-02        UnionExchange
> 01-01          StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 01-02            Project($f0=[0])
> 01-03              HashJoin(condition=[=($1, $10)], joinType=[inner])
> 01-05                HashToRandomExchange(dist0=[[$1]])
> 02-01                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/customer.parquet]],
selectionRoot=/tpch/customer.parquet, columns=[SchemaPath [`c_nationkey`], SchemaPath [`c_custkey`],
SchemaPath [`c_name`], SchemaPath [`c_address`], SchemaPath [`c_phone`], SchemaPath [`c_acctbal`],
SchemaPath [`c_mktsegment`], SchemaPath [`c_comment`]]]])
> 01-04                Project(*0=[$0], n_nationkey=[$1], n_name=[$2], n_comment=[$3])
> 01-06                  HashToRandomExchange(dist0=[[$1]])
> 03-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, columns=[SchemaPath [`n_nationkey`],
SchemaPath [`n_name`], SchemaPath [`n_comment`]]]])
> Here's the Drill Logical plan for the same query:
> | DrillScreenRel
>   DrillAggregateRel(group=[{}], EXPR$0=[COUNT()])
>     DrillProjectRel($f0=[0])
>       DrillJoinRel(condition=[=($1, $10)], joinType=[inner])
>         DrillScanRel(table=[[cp, tpch/customer.parquet]])
>         DrillScanRel(table=[[cp, tpch/nation.parquet]])



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message