drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3538) We do not prune partitions when we count over partitioning key and filter over partitioning key
Date Sun, 01 Nov 2015 18:32:27 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984483#comment-14984483
] 

Aman Sinha commented on DRILL-3538:
-----------------------------------

[~khfaraaz] I am not sure why you say we are not pruning in cases 1, 2, 3.  The Explain looks
fine to me.  There is no Filter node in the plan which indicates it has been pushed into the
Scan.  The reason you see the Scan showing a PojoRecordReader is that for a trivial COUNT(*)
query on Parquet data, Drill optimizes by reading the row count directly from the metadata
instead of doing it through a separate aggregation.  If you are specifically looking for the
Scan to display the attributes it displays for a regular scan, that's a separate issue. 

> We do not prune partitions when we count over partitioning key and filter over partitioning
key
> -----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-3538
>                 URL: https://issues.apache.org/jira/browse/DRILL-3538
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>         Environment: 4 node cluster on CentOS
>            Reporter: Khurram Faraaz
>            Assignee: Aman Sinha
>            Priority: Critical
>             Fix For: 1.3.0
>
>
> We are not partition pruning when we do a count over partitioning key and when the predicate
involves the partitioning key. CTAS used was,
> {code}
> create table t3214 partition by (key2) as select cast(key1 as double) key1, cast(key2
as char(1)) key2 from `twoKeyJsn.json`;
> {code}
> case 1) We do not do partition pruning in this case.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(key2) from t3214 where
key2 = 'm';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0])
> 00-02        Project(EXPR$0=[$0])
> 00-03          Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@e2471d7])
> {code}
> case 2) We do not do partition pruning in this case.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(*) from t3214 where key2
= 'm';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0])
> 00-02        Project(EXPR$0=[$0])
> 00-03          Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@211930a2])
> {code}
> case 3) We do not do partition pruning in this case.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(key1) from t3214 where
key2 = 'm';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0])
> 00-02        Project(EXPR$0=[$0])
> 00-03          Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@23fea3b0])
> {code}
> case 4) we do prune here.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select avg(key1) from t3214 where key2
= 'm';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), $1)):ANY NOT NULL])
> 00-02        StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)])
> 00-03          StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT($0)])
> 00-04            Project(key1=[$1])
> 00-05              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tmp/t3214/0_0_15.parquet]],
selectionRoot=maprfs:/tmp/t3214, numFiles=1, columns=[`key2`, `key1`]]])
> {code}
> case 5) we do prune here.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select min(key1) from t3214 where key2
= 'm';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0])
> 00-02        StreamAgg(group=[{}], EXPR$0=[MIN($0)])
> 00-03          StreamAgg(group=[{}], EXPR$0=[MIN($0)])
> 00-04            Project(key1=[$1])
> 00-05              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tmp/t3214/0_0_15.parquet]],
selectionRoot=maprfs:/tmp/t3214, numFiles=1, columns=[`key2`, `key1`]]])
> {code}
> commit id that I am testing on : 17e580a7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message