drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pritesh Maker (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (DRILL-5773) Project pushdown into a subquery with select *
Date Wed, 27 Sep 2017 03:03:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pritesh Maker reassigned DRILL-5773:
------------------------------------

    Assignee: Hanumath Rao Maduri

> Project pushdown into a subquery with select *
> ----------------------------------------------
>
>                 Key: DRILL-5773
>                 URL: https://issues.apache.org/jira/browse/DRILL-5773
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Jinfeng Ni
>            Assignee: Hanumath Rao Maduri
>
> If a subquery / table expression/ view has a `select *` and out query is requesting a
subset of columns/fields, Drill currently does not do project pushdown into the subquery.
As a result, the scan operator will return every column/field in the table, this would significantly
impact query performance, especially if # of column/field is large.
> For instance,
> {code}
> SELECT n_regionkey, count(*) AS cnt 
> FROM (SELECT * FROM cp.`tpch/nation.parquet`) AS n 
> GROUP BY n_regionkey;
> {code} 
> Here is the plan
> {code}
> 00-00    Screen
> 00-01      Project(n_regionkey=[$0], cnt=[$1])
> 00-02        Project(n_regionkey=[$0], cnt=[$1])
> 00-03          HashAgg(group=[{0}], cnt=[COUNT()])
> 00-04            Project(n_regionkey=[ITEM($0, 'n_regionkey')])
> 00-05              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]],
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, usedMetadataFile=false, columns=[`*`]]])
> {code}
> Notice that in Scan operator `columns = *`, indicating that it will read every column.

> From performance perspective, Drill should push project into subquery with select *.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message