drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Phillips (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-3418) Partition Pruning : We are over-pruning and this leads to wrong results
Date Mon, 29 Jun 2015 20:43:05 GMT

     [ https://issues.apache.org/jira/browse/DRILL-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Steven Phillips updated DRILL-3418:
-----------------------------------
    Attachment: DRILL-3418.patch

> Partition Pruning : We are over-pruning and this leads to wrong results
> -----------------------------------------------------------------------
>
>                 Key: DRILL-3418
>                 URL: https://issues.apache.org/jira/browse/DRILL-3418
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Rahul Challapalli
>            Assignee: Steven Phillips
>            Priority: Critical
>             Fix For: 1.1.0
>
>         Attachments: DRILL-3418.patch
>
>
> git.commit.id.abbrev=c199860
> We are over-pruning based on the below plan.
> {code}
> explain plan for select * from `existing_partition_pruning/lineitem_hierarchical_intstring`
where (dir0=1993 or dir1='jun') and (dir0=1991 or dir1='aug' or columns[0] > 5000);
> 00-00    Screen
> 00-01      Project(*=[$0])
> 00-02        Project(T17¦¦*=[$0])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[AND(OR(=($1, 1993), =($2, 'jun')), OR(=($1, 1991),
=($2, 'aug'), >(ITEM($3, 0), 5000)))])
> 00-05              Project(T17¦¦*=[$0], dir0=[$1], dir1=[$2], columns=[$3])
> 00-06                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitem_hierarchical_intstring/0_0_26.parquet],
ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitem_hierarchical_intstring/0_0_7.parquet]],
selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitem_hierarchical_intstring,
numFiles=2, columns=[`*`]]])
> {code}
> We have 7 year partitions and each year has 12 further month partitions.
> I executed a count(*) query based on the above filters with & without partitioning
in place and the values were different
> Without Partitions :
> {code}
> select count(*) from tbl_nopartitions where (dir0=1993 or dir1='jun') and (dir0=1991
or dir1='aug' or columns[0] > 5000);
> +---------+
> | EXPR$0  |
> +---------+
> | 14515   |
> +---------+
> 1 row selected (0.903 seconds)
> {code}
> With Partitions :
> {code}
> select count(*) from `existing_partition_pruning/lineitem_hierarchical_intstring` where
(dir0=1993 or dir1='jun') and (dir0=1991 or dir1='aug' or columns[0] > 5000);
> +---------+
> | EXPR$0  |
> +---------+
> | 1800    |
> +---------+
> 1 row selected (0.49 seconds)
> {code}
> The data is larger than 10 MB to upload here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message