drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Phillips (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't
Date Sat, 27 Jun 2015 02:25:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603909#comment-14603909
] 

Steven Phillips commented on DRILL-3410:
----------------------------------------

This appears to be due to the fact that the FindPartitionConditions class, which is the code
that walks the expression tree and determines if pruning is valid, assumes that the "Binary"
operators "OR" and "AND" only have two arguments. But you can see from expression in the plan:

{code}
OR(AND(=($1, 1993), >(ITEM($2, 0), 29600)), =($1, 1994), >(ITEM($2, 0), 29700))
{code}

that expression was rewritten with a single OR operator with 3 arguments.

Rewriting the expression with true binary operators seems to fix the problem. I will have
a patch available shortly.

> Partition Pruning : We are doing a prune when we shouldn't
> ----------------------------------------------------------
>
>                 Key: DRILL-3410
>                 URL: https://issues.apache.org/jira/browse/DRILL-3410
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Rahul Challapalli
>            Assignee: Steven Phillips
>            Priority: Critical
>             Fix For: 1.1.0
>
>
> git.commit.id.abbrev=60bc945
> The below plan does not look right. It should scan all the files based on the filters
in the query. Also hive returned more rows than drill
> {code}
> explain plan for select * from `existing_partition_pruning/lineitempart` where (dir0=1993
and columns[0] >29600) or (dir0=1994 or columns[0]>29700);
> | 00-00    Screen
> 00-01      Project(*=[$0])
> 00-02        Project(T70¦¦*=[$0])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[OR(AND(=($1, 1993), >(ITEM($2, 0), 29600)), =($1,
1994), >(ITEM($2, 0), 29700))])
> 00-05              Project(T70¦¦*=[$0], dir0=[$1], columns=[$2])
> 00-06                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet],
ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]],
selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart,
numFiles=2, columns=[`*`]]])
>  |
> {code}
> I attached the data set used. Let me know if you need anything more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message