drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2748) Filter is not pushed down into subquery with the group by
Date Thu, 17 Sep 2015 21:27:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804541#comment-14804541
] 

Jinfeng Ni commented on DRILL-2748:
-----------------------------------

The reason that the unit test case I added in the first path worked : the filter is on partition
column. The filter pushdown lead to partition pruning, which would lead to reduction in the
scan cost. Therefore, the new plan with filter push down is estimated to have lower cost.


> Filter is not pushed down into subquery with the group by
> ---------------------------------------------------------
>
>                 Key: DRILL-2748
>                 URL: https://issues.apache.org/jira/browse/DRILL-2748
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization
>    Affects Versions: 0.9.0, 1.0.0, 1.1.0
>            Reporter: Victoria Markman
>            Assignee: Aman Sinha
>             Fix For: 1.2.0
>
>         Attachments: 0001-DRILL-2748-Improve-cost-estimation-for-Drill-logical.patch
>
>
> I'm not sure about this one, theoretically filter could have been pushed into the subquery.
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select x, y, z from (select a1, b1, avg(a1)
from t1 group by a1, b1) as sq(x, y, z) where x = 10;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(x=[$0], y=[$1], z=[$2])
> 00-02        Project(x=[$0], y=[$1], z=[CAST(/(CastHigh(CASE(=($3, 0), null, $2)), $3)):ANY
NOT NULL])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($0, 10)])
> 00-05              HashAgg(group=[{0, 1}], agg#0=[$SUM0($0)], agg#1=[COUNT($0)])
> 00-06                Project(a1=[$1], b1=[$0])
> 00-07                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/predicates/t1]],
selectionRoot=/drill/testdata/predicates/t1, numFiles=1, columns=[`a1`, `b1`]]])
> {code}
> Same with distinct in subquery:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select x, y, z from ( select distinct a1,
b1, c1 from t1 ) as sq(x, y, z) where x = 10;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(x=[$0], y=[$1], z=[$2])
> 00-02        Project(x=[$0], y=[$1], z=[$2])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($0, 10)])
> 00-05              HashAgg(group=[{0, 1, 2}])
> 00-06                Project(a1=[$2], b1=[$1], c1=[$0])
> 00-07                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/predicates/t1]],
selectionRoot=/drill/testdata/predicates/t1, numFiles=1, columns=[`a1`, `b1`, `c1`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message