drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation
Date Wed, 18 Mar 2015 02:29:38 GMT

     [ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aman Sinha updated DRILL-2488:
------------------------------
    Attachment: 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch

It turns out to be an issue with supported encoding for MergeJoin.  Merge Join execution operator
currently does not process incoming batches with SV2 or SV4, so if there was a Limit or Sort
below, we need to insert a SelectionVectorRemover below the MJ.  
Uploaded a patch with a simple fix.  [~vkorukanti] could you please review ?  I haven't run
all tests yet..still in process.

> Wrong result on join between two subqueries with aggregation
> ------------------------------------------------------------
>
>                 Key: DRILL-2488
>                 URL: https://issues.apache.org/jira/browse/DRILL-2488
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 0.8.0
>            Reporter: Victoria Markman
>            Assignee: Aman Sinha
>            Priority: Critical
>             Fix For: 0.9.0
>
>         Attachments: 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch,
t1.parquet
>
>
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1;
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> | 4          | null       | 2015-01-04 |
> | 5          | eeeee      | 2015-01-05 |
> | 6          | fffff      | 2015-01-06 |
> | 7          | ggggg      | 2015-01-07 |
> | null       | hhhhh      | 2015-01-08 |
> | 9          | iiiii      | null       |
> | 10         | jjjjj      | 2015-01-10 |
> +------------+------------+------------+
> 10 rows selected (0.15 seconds)
> {code}
> This result is incorrect, one row is missing
> {code}
> 0: jdbc:drill:schema=dfs> select * from
> . . . . . . . . . . . . > (
> . . . . . . . . . . . . >                 select
> . . . . . . . . . . . . >                         b1,
> . . . . . . . . . . . . >                         count(distinct a1)
> . . . . . . . . . . . . >                 from
> . . . . . . . . . . . . >                         t1
> . . . . . . . . . . . . >                 group by
> . . . . . . . . . . . . >                         b1
> . . . . . . . . . . . . >                 order by
> . . . . . . . . . . . . >                         b1 limit 5 offset 1
> . . . . . . . . . . . . >         ) as sq1(x1, y1)
> . . . . . . . . . . . . >
> . . . . . . . . . . . . >         inner join
> . . . . . . . . . . . . >
> . . . . . . . . . . . . >         (
> . . . . . . . . . . . . >                 select
> . . . . . . . . . . . . >                         b1,
> . . . . . . . . . . . . >                         count(distinct a1)
> . . . . . . . . . . . . >                 from
> . . . . . . . . . . . . >                         t1
> . . . . . . . . . . . . >                 group by
> . . . . . . . . . . . . >                         b1
> . . . . . . . . . . . . >                 order by
> . . . . . . . . . . . . >                         b1 limit 5 offset 1
> . . . . . . . . . . . . >         ) as sq2(x1, y1)
> . . . . . . . . . . . . >         on
> . . . . . . . . . . . . >                 sq1.x1 = sq2.x1 and
> . . . . . . . . . . . . >                 sq2.y1 = sq2.y1
> . . . . . . . . . . . . > ;
> +------------+------------+------------+------------+
> |     x1     |     y1     |    x10     |    y10     |
> +------------+------------+------------+------------+
> | bbbbb      | 1          | bbbbb      | 1          |
> | ccccc      | 1          | ccccc      | 1          |
> | eeeee      | 1          | eeeee      | 1          |
> | fffff      | 1          | fffff      | 1          |
> +------------+------------+------------+------------+
> 4 rows selected (0.28 seconds)
> {code}
> Explain plan for the wrong result:
> {code}
> 00-01      Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
> 00-02        Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
> 00-03          MergeJoin(condition=[=($0, $2)], joinType=[inner])
> 00-05            Limit(offset=[1], fetch=[5])
> 00-07              StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
> 00-09                Sort(sort0=[$0], dir0=[ASC])
> 00-11                  StreamAgg(group=[{0, 1}])
> 00-13                    Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
> 00-15                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1,
numFiles=1, columns=[`b1`, `a1`]]])
> 00-04            Project(b10=[$0], EXPR$10=[$1])
> 00-06              SelectionVectorRemover
> 00-08                Sort(sort0=[$0], dir0=[ASC])
> 00-10                  Filter(condition=[=($1, $1)])
> 00-12                    Limit(offset=[1], fetch=[5])
> 00-14                      StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
> 00-16                        Sort(sort0=[$0], dir0=[ASC])
> 00-17                          StreamAgg(group=[{0, 1}])
> 00-18                            Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
> 00-19                              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1,
numFiles=1, columns=[`b1`, `a1`]]])
> {code}
> If you turn off merge join, query returns correct result:
> {code}
> 0: jdbc:drill:schema=dfs> select * from
> . . . . . . . . . . . . > (
> . . . . . . . . . . . . >     select
> . . . . . . . . . . . . >         b1,
> . . . . . . . . . . . . >         count(distinct a1)
> . . . . . . . . . . . . >     from
> . . . . . . . . . . . . >         t1
> . . . . . . . . . . . . >     group by
> . . . . . . . . . . . . >         b1
> . . . . . . . . . . . . >     order by
> . . . . . . . . . . . . >         b1 limit 5 offset 1
> . . . . . . . . . . . . >  ) as sq1(x1, y1)
> . . . . . . . . . . . . >
> . . . . . . . . . . . . >         inner join
> . . . . . . . . . . . . > (
> . . . . . . . . . . . . >     select
> . . . . . . . . . . . . >         b1,
> . . . . . . . . . . . . >         count(distinct a1)
> . . . . . . . . . . . . >     from
> . . . . . . . . . . . . >         t1
> . . . . . . . . . . . . >     group by
> . . . . . . . . . . . . >         b1
> . . . . . . . . . . . . >     order by
> . . . . . . . . . . . . >           b1 limit 5 offset 1
> . . . . . . . . . . . . > ) as sq2(x1, y1)
> . . . . . . . . . . . . > on
> . . . . . . . . . . . . >                 sq1.x1 = sq2.x1 and
> . . . . . . . . . . . . >                 sq2.y1 = sq2.y1
> . . . . . . . . . . . . > ;
> +------------+------------+------------+------------+
> |     x1     |     y1     |    x10     |    y10     |
> +------------+------------+------------+------------+
> | bbbbb      | 1          | bbbbb      | 1          |
> | ccccc      | 1          | ccccc      | 1          |
> | eeeee      | 1          | eeeee      | 1          |
> | fffff      | 1          | fffff      | 1          |
> | ggggg      | 1          | ggggg      | 1          |
> +------------+------------+------------+------------+
> 5 rows selected (0.352 seconds)
> {code}
> cut/paste reproduction
> {code:sql}
> select * from
>  (
>                  select 
>                          b1,                      
>                          count(distinct a1)       
>                  from 
>                          t1                       
>                  group by                 
>                          b1                       
>                  order by                 
>                          b1 limit 5 offset 1      
>          ) as sq1(x1, y1)         
>   
>          inner join 
>   
>          (
>                  select 
>                          b1,                      
>                          count(distinct a1)       
>                  from 
>                          t1                       
>                  group by                 
>                          b1                       
>                  order by                 
>                          b1 limit 5 offset 1      
>          ) as sq2(x1, y1)         
>          on 
>                  sq1.x1 = sq2.x1 and      
>                  sq2.y1 = sq2.y1          
>  ;
> {code}
> This test has been running and passing in regression test suite until framework was switched
to a new code, where JSON parsing was replaced with jackson  and for a brief period ( I believe
Friday afternoon until now ) this suite was not executed.
> We already have a merge join bug DRILL-2010, but this one seems to be different (small
data set) and feels like a recent regression. 
> For QA: test Functional/Passing/aggregation/sanity/q18.sql will be running with merge
join disabled until this issue is fixed. Will need to remove alter session from this file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message