drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-4467) Invalid projection created using PrelUtil.getColumns
Date Fri, 04 Mar 2016 11:20:41 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179759#comment-15179759
] 

Jacques Nadeau edited comment on DRILL-4467 at 3/4/16 11:20 AM:
----------------------------------------------------------------

This lack of stability also is causing incorrect plans, for example, the plan for this regression
test is invalid (but may execute correctly because Drill resolves using names rather than
ordinals):

https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/hbase/hbase_pushdown/plan/pushdown_p3.e_tsv

{code:title=Plan in current test framework (wrong, current master)}
    Screen
      Project(EXPR$0=[/(CAST($1):INTEGER, CAST($2):FLOAT)])
        Project(row_key=[$1], ITEM=[ITEM($2, 'age')], ITEM2=[ITEM($0, 'gpa')])
          Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=student,
startRow=750\x00, stopRow=800, filter=FilterList AND (2/2): [RowFilter (LESS, 800), RowFilter
(GREATER, 750)]], columns=[`row_key`, `twocf`.`age`, `threecf`.`gpa`]]])
{code}

But once we apply the desiredFields LinkedHashSet fix, we see stability/correct ordinals in
the project above the Scan:
{code:title=Plan using LinkedHashSet fix}
00-00    Screen
00-01      Project(EXPR$0=[/(CAST($1):INTEGER, CAST($2):FLOAT)])
00-02        Project(row_key=[$0], ITEM=[ITEM($1, 'age')], ITEM2=[ITEM($2, 'gpa')])
00-03          Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=student,
startRow=750\x00, stopRow=800, filter=FilterList AND (2/2): [RowFilter (LESS, 800), RowFilter
(GREATER, 750)]], columns=[`row_key`, `twocf`.`age`, `threecf`.`gpa`]]])
{code}

Note how the columns list in the scan is now consistent with the field indices in the project.


was (Author: jnadeau):
This lack of stability also is causing incorrect plans, for example, the plan for this regression
test is invalid (but may execute correctly because Drill resolves using names rather than
ordinals):

https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/hbase/hbase_pushdown/plan/pushdown_p3.e_tsv

{code:title=PlanWithoutStablity (wrong)}
    Screen
      Project(EXPR$0=[/(CAST($1):INTEGER, CAST($2):FLOAT)])
        Project(row_key=[$1], ITEM=[ITEM($2, 'age')], ITEM2=[ITEM($0, 'gpa')])
          Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=student,
startRow=750\x00, stopRow=800, filter=FilterList AND (2/2): [RowFilter (LESS, 800), RowFilter
(GREATER, 750)]], columns=[`row_key`, `twocf`.`age`, `threecf`.`gpa`]]])
{code}

But once we apply the desiredFields LinkedHashSet fix, we see stability/correct ordinals in
the project above the Scan:
{code:title=PlanWithStability}
00-00    Screen
00-01      Project(EXPR$0=[/(CAST($1):INTEGER, CAST($2):FLOAT)])
00-02        Project(row_key=[$0], ITEM=[ITEM($1, 'age')], ITEM2=[ITEM($2, 'gpa')])
00-03          Scan(groupscan=[HBaseGroupScan [HBaseScanSpec=HBaseScanSpec [tableName=student,
startRow=750\x00, stopRow=800, filter=FilterList AND (2/2): [RowFilter (LESS, 800), RowFilter
(GREATER, 750)]], columns=[`row_key`, `twocf`.`age`, `threecf`.`gpa`]]])
{code}


> Invalid projection created using PrelUtil.getColumns
> ----------------------------------------------------
>
>                 Key: DRILL-4467
>                 URL: https://issues.apache.org/jira/browse/DRILL-4467
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Laurent Goujon
>            Assignee: Jacques Nadeau
>            Priority: Critical
>             Fix For: 1.6.0
>
>
> In {{DrillPushProjIntoScan}}, a new scan and a new projection are created using {{PrelUtil#getColumn(RelDataType,
List<RexNode>)}}.
> The returned {{ProjectPushInfo}} instance has several fields, one of them is {{desiredFields}}
which is the list of projected fields. There's one instance per {{RexNode}} but because instances
were initially added to a set, they might not be in the same order as the order they were
created.
> The issue happens in the following code:
> {code:java}
>       List<RexNode> newProjects = Lists.newArrayList();
>       for (RexNode n : proj.getChildExps()) {
>         newProjects.add(n.accept(columnInfo.getInputRewriter()));
>       }
> {code}
> This code creates a new list of projects out of the initial ones, by mapping the indices
from the old projects to the new projects, but the indices of the new RexNode instances might
be out of order (because of the ordering of desiredFields). And if indices are out of order,
the check {{ProjectRemoveRule.isTrivial(newProj)}} will fail.
> My guess is that desiredFields ordering should be preserved when instances are added,
to satisfy the condition above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message