drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-2761) ParquetGroupScan copy constructor only copy reference, leading to out-sync ParquetGroupScan instance.
Date Sat, 18 Apr 2015 21:07:58 GMT

     [ https://issues.apache.org/jira/browse/DRILL-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jinfeng Ni updated DRILL-2761:
------------------------------
    Attachment: 0003-DRILL-2761-ParquetGroupScan-s-copy-constructor-shoul.patch

> ParquetGroupScan copy constructor only copy reference, leading to out-sync ParquetGroupScan
instance.
> -----------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-2761
>                 URL: https://issues.apache.org/jira/browse/DRILL-2761
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>         Attachments: 0003-DRILL-2761-ParquetGroupScan-s-copy-constructor-shoul.patch
>
>
> ParquetGroupScan has one copy constructor, which will be used in project pushdown rule
and partition pruning rule to clone a modified version of original ParquetGroupScan instance.
However, the copy constructor only copy the reference to several Collections, this means that
if the cloned instance modify those collections, it will also modify the contents of the collections
in the original ParquetGroupScan instance, leading to an invalid status for the original ParquetGroupScan
instance.  Such invalid status would lead incorrect query result. 
> For instance, consider query:
> {code}
> select O_ORDERKEY,O_CUSTKEY,O_CLERK,O_COMMENT,dir0 
> from `/drill/testdata/partition_pruning/dfs/orders` 
> where (dir0=1993)
> {code}
> Assume the data is partitioned with year (1993, 1994, 1995). Depending on the order of
RelOptRule's firing, a ParquetGroupScan could have out-sync of "rowGroupInfos" list and "entries"
list, this will make optimizer thinks that the partition filter is pushed, such that "entries"
is modified and filter is removed from the plan, yet the "rowGroupInfors" is still in the
original one.   This will make the query return unwanted rows back.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message