drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victoria Markman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2247) SUM with CASE statement on the column of the null producing side of left outer join returns wrong result
Date Mon, 06 Jul 2015 21:49:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615727#comment-14615727
] 

Victoria Markman commented on DRILL-2247:
-----------------------------------------

1. column c1 in x1.parquet and x2.parquet is of decimal data type (before it was disabled
in 1.0) 
2. I can't reproduce this anymore: tried to force the plan that is posted in the bug by disabling
merge join and setting `planner.slice_target` = 1 and I get correct result.

With 1.1 I get:

Default plan with all the default settings (result is correct)
{code}
0: jdbc:drill:schema=dfs> explain plan for select
. . . . . . . . . . . . >         count(*)     as count_star,                         
      
. . . . . . . . . . . . >         sum(x2.c1)   as simle_sum,                          
      
. . . . . . . . . . . . >         round(avg(x2.c1))  as round_avg_x2_c1,              
      
. . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12)
end) as sum_with_case
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         x1 left outer join x2 on x1.c1 = x2.c1
. . . . . . . . . . . . > group by
. . . . . . . . . . . . >         x1.c1;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(count_star=[$0], simle_sum=[$1], round_avg_x2_c1=[$2], sum_with_case=[$3])
00-02        Project(count_star=[$1], simle_sum=[CASE(=($3, 0), null, $2)], round_avg_x2_c1=[ROUND(CAST(/(CastHigh(CASE(=($3,
0), null, $2)), $3)):ANY NOT NULL)], sum_with_case=[CASE(=($5, 0), null, $4)])
00-03          HashAgg(group=[{0}], count_star=[COUNT()], agg#1=[$SUM0($1)], agg#2=[COUNT($1)],
agg#3=[$SUM0($2)], agg#4=[COUNT($2)])
00-04            Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY, ROUND(/($1,
12)))])
00-05              Project(c1=[$1], c10=[$0])
00-06                MergeJoin(condition=[=($1, $0)], joinType=[right])
00-08                  SelectionVectorRemover
00-10                    Sort(sort0=[$0], dir0=[ASC])
00-12                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/x2]],
selectionRoot=maprfs:/drill/testdata/subqueries/x2, numFiles=1, columns=[`c1`]]])
00-07                  Project(c10=[$0])
00-09                    SelectionVectorRemover
00-11                      Sort(sort0=[$0], dir0=[ASC])
00-13                        Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:///drill/testdata/subqueries/x1]], selectionRoot=maprfs:/drill/testdata/subqueries/x1,
numFiles=1, columns=[`c1`]]])
{code}

With merge join disabled (correct result):
{code}
0: jdbc:drill:schema=dfs> select
. . . . . . . . . . . . >         count(*)     as count_star,                         
      
. . . . . . . . . . . . >         sum(x2.c1)   as simle_sum,                          
      
. . . . . . . . . . . . >         round(avg(x2.c1))  as round_avg_x2_c1,              
      
. . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12)
end) as sum_with_case
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         `x1.parquet` x1 left outer join `x2.parquet` x2 on x1.c1
= x2.c1
. . . . . . . . . . . . > group by
. . . . . . . . . . . . >         x1.c1;
+-------------+------------+------------------+----------------+
| count_star  | simle_sum  | round_avg_x2_c1  | sum_with_case  |
+-------------+------------+------------------+----------------+
| 1           | null       | null             | null           |
+-------------+------------+------------------+----------------+
1 row selected (0.517 seconds)
0: jdbc:drill:schema=dfs> explain plan for select
. . . . . . . . . . . . >         count(*)     as count_star,                         
      
. . . . . . . . . . . . >         sum(x2.c1)   as simle_sum,                          
      
. . . . . . . . . . . . >         round(avg(x2.c1))  as round_avg_x2_c1,              
      
. . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12)
end) as sum_with_case
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         `x1.parquet` x1 left outer join `x2.parquet` x2 on x1.c1
= x2.c1
. . . . . . . . . . . . > group by
. . . . . . . . . . . . >         x1.c1;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(count_star=[$0], simle_sum=[$1], round_avg_x2_c1=[$2], sum_with_case=[$3])
00-02        Project(count_star=[$1], simle_sum=[CASE(=($3, 0), null, $2)], round_avg_x2_c1=[ROUND(CAST(/(CastHigh(CASE(=($3,
0), null, $2)), $3)):ANY NOT NULL)], sum_with_case=[CASE(=($5, 0), null, $4)])
00-03          HashAgg(group=[{0}], count_star=[COUNT()], agg#1=[$SUM0($1)], agg#2=[COUNT($1)],
agg#3=[$SUM0($2)], agg#4=[COUNT($2)])
00-04            Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY, ROUND(/($1,
12)))])
00-05              Project(c1=[$1], c10=[$0])
00-06                HashJoin(condition=[=($1, $0)], joinType=[right])
00-08                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/x2.parquet]],
selectionRoot=maprfs:/drill/testdata/subqueries/x2.parquet, numFiles=1, columns=[`c1`]]])
00-07                  Project(c10=[$0])
00-09                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/x1.parquet]],
selectionRoot=maprfs:/drill/testdata/subqueries/x1.parquet, numFiles=1, columns=[`c1`]]])
{code}


{code}
0: jdbc:drill:schema=dfs> alter session set `planner.slice_target` = 1;
+-------+--------------------------------+
|  ok   |            summary             |
+-------+--------------------------------+
| true  | planner.slice_target updated.  |
+-------+--------------------------------+
1 row selected (0.104 seconds)

0: jdbc:drill:schema=dfs> select
. . . . . . . . . . . . >         count(*)     as count_star,                         
      
. . . . . . . . . . . . >         sum(x2.c1)   as simle_sum,                          
      
. . . . . . . . . . . . >         round(avg(x2.c1))  as round_avg_x2_c1,              
      
. . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12)
end) as sum_with_case
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         `x1.parquet` x1 left outer join `x2.parquet` x2 on x1.c1
= x2.c1
. . . . . . . . . . . . > group by
. . . . . . . . . . . . >         x1.c1;
+-------------+------------+------------------+----------------+
| count_star  | simle_sum  | round_avg_x2_c1  | sum_with_case  |
+-------------+------------+------------------+----------------+
| 1           | null       | null             | null           |
+-------------+------------+------------------+----------------+
1 row selected (1.959 seconds)
0: jdbc:drill:schema=dfs> explain plan for select
. . . . . . . . . . . . >         count(*)     as count_star,                         
      
. . . . . . . . . . . . >         sum(x2.c1)   as simle_sum,                          
      
. . . . . . . . . . . . >         round(avg(x2.c1))  as round_avg_x2_c1,              
      
. . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12)
end) as sum_with_case
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         `x1.parquet` x1 left outer join `x2.parquet` x2 on x1.c1
= x2.c1
. . . . . . . . . . . . > group by
. . . . . . . . . . . . >         x1.c1;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(count_star=[$0], simle_sum=[$1], round_avg_x2_c1=[$2], sum_with_case=[$3])
00-02        UnionExchange
01-01          Project(count_star=[$1], simle_sum=[CASE(=($3, 0), null, $2)], round_avg_x2_c1=[ROUND(CAST(/(CastHigh(CASE(=($3,
0), null, $2)), $3)):ANY NOT NULL)], sum_with_case=[CASE(=($5, 0), null, $4)])
01-02            HashAgg(group=[{0}], count_star=[COUNT()], agg#1=[$SUM0($1)], agg#2=[COUNT($1)],
agg#3=[$SUM0($2)], agg#4=[COUNT($2)])
01-03              Project(c1=[$0], c10=[$1], $f2=[$2])
01-04                HashToRandomExchange(dist0=[[$0]])
02-01                  UnorderedMuxExchange
03-01                    Project(c1=[$0], c10=[$1], $f2=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
03-02                      Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY, ROUND(/($1,
12)))])
03-03                        Project(c1=[$1], c10=[$0])
03-04                          HashJoin(condition=[=($1, $0)], joinType=[right])
03-06                            Project(c1=[$0])
03-08                              HashToRandomExchange(dist0=[[$0]])
04-01                                UnorderedMuxExchange
06-01                                  Project(c1=[$0], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
06-02                                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:///drill/testdata/subqueries/x2.parquet]], selectionRoot=maprfs:/drill/testdata/subqueries/x2.parquet,
numFiles=1, columns=[`c1`]]])
03-05                            Project(c10=[$0])
03-07                              Project(c1=[$0])
03-09                                HashToRandomExchange(dist0=[[$0]])
05-01                                  UnorderedMuxExchange
07-01                                    Project(c1=[$0], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
07-02                                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:///drill/testdata/subqueries/x1.parquet]], selectionRoot=maprfs:/drill/testdata/subqueries/x1.parquet,
numFiles=1, columns=[`c1`]]])
{code}

> SUM with CASE statement on the column of the null producing side of left outer join returns
wrong result
> --------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-2247
>                 URL: https://issues.apache.org/jira/browse/DRILL-2247
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.8.0
>            Reporter: Victoria Markman
>            Assignee: Victoria Markman
>             Fix For: 1.1.0
>
>         Attachments: x1.parquet, x2.parquet
>
>
> {code}
> 0: jdbc:drill:schema=dfs> select * from x1;
> +------------+
> |     c1     |
> +------------+
> | 1032.6516  |
> +------------+
> 1 row selected (0.093 seconds)
> 0: jdbc:drill:schema=dfs> select * from x2;
> +------------+
> |     c1     |
> +------------+
> | 1057.3117  |
> | 1090.8299  |
> +------------+
> 2 rows selected (0.085 seconds)
> {code}
> Correct result is NULL:
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >         sum(x2.c1)                                     
           as simple_sum,
> . . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12)
end) as sum_with_case
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . >         x1 left outer join x2 on x1.c1 = x2.c1
> . . . . . . . . . . . . > group by
> . . . . . . . . . . . . >         x1.c1;
> +------------+---------------+
> | simple_sum | sum_with_case |
> +------------+---------------+
> | 0.0000     | null          |
> +------------+---------------+
> 1 row selected (0.173 seconds)
> {code}
> Query plan for correct result:
> {code}
> 00-01      Project(simple_sum=[$0], sum_with_case=[$1])
> 00-02        UnionExchange
> 01-01          Project(simple_sum=[$1], sum_with_case=[$2])
> 01-02            HashAgg(group=[{0}], simple_sum=[SUM($1)], sum_with_case=[SUM($2)])
> 01-03              Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL,
ROUND(/($1, 12)))])
> 01-04                HashJoin(condition=[=($0, $1)], joinType=[left])
> 01-06                  HashToRandomExchange(dist0=[[$0]])
> 02-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/aggregation/x1]], selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
> 01-05                  Project(c10=[$0])
> 01-07                    HashToRandomExchange(dist0=[[$0]])
> 03-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/aggregation/x2]], selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
> {code}
> Wrong result: sum with case returns 0.0 instead of NULL
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >         count(*)                                       
           as count_star,
> . . . . . . . . . . . . >         sum(x2.c1)                                     
           as simle_sum,
> . . . . . . . . . . . . >         round(avg(x2.c1))                              
           as round_avg_x2_c1,
> . . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12)
end) as sum_with_case
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . >         x1 left outer join x2 on x1.c1 = x2.c1
> . . . . . . . . . . . . > group by
> . . . . . . . . . . . . >         x1.c1;
> +------------+------------+-----------------+---------------+
> | count_star | simle_sum  | round_avg_x2_c1 | sum_with_case |
> +------------+------------+-----------------+---------------+
> | 1          | null       | null            | 0.0           |
> +------------+------------+-----------------+---------------+
> {code}
> Query plan for the wrong result query:
> {code}
> 00-01      Project(count_star=[$0], simle_sum=[$1], round_avg_x2_c1=[$2], sum_with_case=[$3])
> 00-02        UnionExchange
> 01-01          Project(count_star=[$1], simle_sum=[CASE(=($3, 0), null, $2)], round_avg_x2_c1=[ROUND(CAST(/(CastHigh(CASE(=($3,
0), null, $2)), $3)):ANY)], sum_with_case=[$4])
> 01-02            HashAgg(group=[{0}], count_star=[COUNT()], agg#1=[$SUM0($1)], agg#2=[COUNT($1)],
agg#3=[$SUM0($2)])
> 01-03              Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL,
ROUND(/($1, 12)))])
> 01-04                HashJoin(condition=[=($0, $1)], joinType=[left])
> 01-06                  HashToRandomExchange(dist0=[[$0]])
> 02-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/aggregation/x1]], selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
> 01-05                  Project(c10=[$0])
> 01-07                    HashToRandomExchange(dist0=[[$0]])
> 03-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/aggregation/x2]], selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
> {code}
> Cut/paste version of the query, comment out round function, query will return correct
result.
> {code:sql}
> select
>         count(*)     as count_star,                                
>         sum(x2.c1)   as simle_sum,                                 
>         round(avg(x2.c1))  as round_avg_x2_c1,                     
>         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case
> from
>         x1 left outer join x2 on x1.c1 = x2.c1
> group by
>         x1.c1;
> {code}
> Not sure about how critical this bug is. Inconsistency is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message