drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Phillips (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3419) Ambiguity in query plan when we do partition pruning
Date Mon, 29 Jun 2015 23:01:05 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606610#comment-14606610
] 

Steven Phillips commented on DRILL-3419:
----------------------------------------

We actually are pruning in case 3. The problem is, every file gets pruned out. We currently
don't handle this case very well, since there is no "Empty Scan" operator. The quick solution
was to scan just one of the files, and include the filter in the plan. We should figure out
a better way to handle this.

> Ambiguity in query plan when we do partition pruning
> ----------------------------------------------------
>
>                 Key: DRILL-3419
>                 URL: https://issues.apache.org/jira/browse/DRILL-3419
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.1.0
>            Reporter: Khurram Faraaz
>            Assignee: Steven Phillips
>             Fix For: 1.2.0
>
>
> Note that in case (1) and case (2) we prune, however it is not clear if we prune is case
(3), that is because we see a FILTER in the query plan in case (3)
> CTAS 
> {code}
> 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE CTAS_ONE_MILN_RWS_PER_GROUP(col1, col2)
PARTITION BY (col2) AS select cast(columns[0] as bigint) col1, cast(columns[1] as char(2))
col2 from `millionValGroup.csv`;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 1_1       | 21932064                   |
> | 1_0       | 28067936                   |
> +-----------+----------------------------+
> 2 rows selected (73.661 seconds)
> {code}
> case 1)
> {code}
> explain plan for select col1, col2 from CTAS_ONE_MILN_RWS_PER_GROUP where col2 LIKE '%Z%';
> | 00-00    Screen
> 00-01      Project(col1=[$0], col2=[$1])
> 00-02        UnionExchange
> 01-01          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_0_3.parquet],
ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_1_3.parquet]], selectionRoot=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP,
numFiles=2, columns=[`col2`, `col1`]]])
> {code}
> case 2)
> {code}
> explain plan for select col1, col2 from CTAS_ONE_MILN_RWS_PER_GROUP where col2 LIKE 'A%';
> | 00-00    Screen
> 00-01      Project(col1=[$0], col2=[$1])
> 00-02        UnionExchange
> 01-01          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_0_3.parquet],
ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_0_2.parquet], ReadEntryWithPath
[path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_1_1.parquet], ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_1_2.parquet],
ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_1_3.parquet], ReadEntryWithPath
[path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_0_1.parquet]], selectionRoot=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP,
numFiles=6, columns=[`col2`, `col1`]]])
> {code}
> case 3) we are NOT pruning here.
> {code}
> explain plan for select col1, col2 from CTAS_ONE_MILN_RWS_PER_GROUP where col2 LIKE 'Z%';
> | 00-00    Screen
> 00-01      Project(col1=[$1], col2=[$0])
> 00-02        SelectionVectorRemover
> 00-03          Filter(condition=[LIKE($0, 'Z%')])
> 00-04            Project(col2=[$1], col1=[$0])
> 00-05              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_1_48.parquet]],
selectionRoot=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP, numFiles=1, columns=[`col2`, `col1`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message