drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victoria Markman (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-2247) SUM with CASE statement on the column of the null producing side of left outer join returns wrong result
Date Fri, 13 Feb 2015 23:20:13 GMT

     [ https://issues.apache.org/jira/browse/DRILL-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Victoria Markman updated DRILL-2247:
------------------------------------
    Description: 
{code}
0: jdbc:drill:schema=dfs> select * from x1;
+------------+
|     c1     |
+------------+
| 1032.6516  |
+------------+
1 row selected (0.093 seconds)

0: jdbc:drill:schema=dfs> select * from x2;
+------------+
|     c1     |
+------------+
| 1057.3117  |
| 1090.8299  |
+------------+
2 rows selected (0.085 seconds)
{code}
Correct result is NULL:
{code}
0: jdbc:drill:schema=dfs> select
. . . . . . . . . . . . >         sum(x2.c1)                                          
      as simple_sum,
. . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12)
end) as sum_with_case
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         x1 left outer join x2 on x1.c1 = x2.c1
. . . . . . . . . . . . > group by
. . . . . . . . . . . . >         x1.c1;
+------------+---------------+
| simple_sum | sum_with_case |
+------------+---------------+
| 0.0000     | null          |
+------------+---------------+
1 row selected (0.173 seconds)
{code}

Query plan for correct result:
{code}
00-01      Project(simple_sum=[$0], sum_with_case=[$1])
00-02        UnionExchange
01-01          Project(simple_sum=[$1], sum_with_case=[$2])
01-02            HashAgg(group=[{0}], simple_sum=[SUM($1)], sum_with_case=[SUM($2)])
01-03              Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL,
ROUND(/($1, 12)))])
01-04                HashJoin(condition=[=($0, $1)], joinType=[left])
01-06                  HashToRandomExchange(dist0=[[$0]])
02-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x1]],
selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
01-05                  Project(c10=[$0])
01-07                    HashToRandomExchange(dist0=[[$0]])
03-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x2]],
selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
{code}
Wrong result: sum with case returns 0.0 instead of NULL
{code}
0: jdbc:drill:schema=dfs> select
. . . . . . . . . . . . >         count(*)                                            
      as count_star,
. . . . . . . . . . . . >         sum(x2.c1)                                          
      as simle_sum,
. . . . . . . . . . . . >         round(avg(x2.c1))                                   
      as round_avg_x2_c1,
. . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12)
end) as sum_with_case
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         x1 left outer join x2 on x1.c1 = x2.c1
. . . . . . . . . . . . > group by
. . . . . . . . . . . . >         x1.c1;
+------------+------------+-----------------+---------------+
| count_star | simle_sum  | round_avg_x2_c1 | sum_with_case |
+------------+------------+-----------------+---------------+
| 1          | null       | null            | 0.0           |
+------------+------------+-----------------+---------------+
{code}

Query plan for the wrong result query:
{code}
00-01      Project(count_star=[$0], simle_sum=[$1], round_avg_x2_c1=[$2], sum_with_case=[$3])
00-02        UnionExchange
01-01          Project(count_star=[$1], simle_sum=[CASE(=($3, 0), null, $2)], round_avg_x2_c1=[ROUND(CAST(/(CastHigh(CASE(=($3,
0), null, $2)), $3)):ANY)], sum_with_case=[$4])
01-02            HashAgg(group=[{0}], count_star=[COUNT()], agg#1=[$SUM0($1)], agg#2=[COUNT($1)],
agg#3=[$SUM0($2)])
01-03              Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL,
ROUND(/($1, 12)))])
01-04                HashJoin(condition=[=($0, $1)], joinType=[left])
01-06                  HashToRandomExchange(dist0=[[$0]])
02-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x1]],
selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
01-05                  Project(c10=[$0])
01-07                    HashToRandomExchange(dist0=[[$0]])
03-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x2]],
selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
{code}

Cut/paste version of the query, comment out round function, query will return correct result.
{code:sql}
select
        count(*)     as count_star,                                
        sum(x2.c1)   as simle_sum,                                 
        round(avg(x2.c1))  as round_avg_x2_c1,                     
        sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case
from
        x1 left outer join x2 on x1.c1 = x2.c1
group by
        x1.c1;
{code}

Not sure about how critical this bug is. Inconsistency is bad.

  was:
{code}
0: jdbc:drill:schema=dfs> select * from x1;
+------------+
|     c1     |
+------------+
| 1032.6516  |
+------------+
1 row selected (0.093 seconds)

0: jdbc:drill:schema=dfs> select * from x2;
+------------+
|     c1     |
+------------+
| 1057.3117  |
| 1090.8299  |
+------------+
2 rows selected (0.085 seconds)
{code}
Correct result is NULL:
{code}
0: jdbc:drill:schema=dfs> select
. . . . . . . . . . . . >         sum(x2.c1)                                          
      as simple_sum,
. . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12)
end) as sum_with_case
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         x1 left outer join x2 on x1.c1 = x2.c1
. . . . . . . . . . . . > group by
. . . . . . . . . . . . >         x1.c1;
+------------+---------------+
| simple_sum | sum_with_case |
+------------+---------------+
| 0.0000     | null          |
+------------+---------------+
1 row selected (0.173 seconds)
{code}

Query plan for correct result:
{code}
00-01      Project(simple_sum=[$0], sum_with_case=[$1])
00-02        UnionExchange
01-01          Project(simple_sum=[$1], sum_with_case=[$2])
01-02            HashAgg(group=[{0}], simple_sum=[SUM($1)], sum_with_case=[SUM($2)])
01-03              Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL,
ROUND(/($1, 12)))])
01-04                HashJoin(condition=[=($0, $1)], joinType=[left])
01-06                  HashToRandomExchange(dist0=[[$0]])
02-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x1]],
selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
01-05                  Project(c10=[$0])
01-07                    HashToRandomExchange(dist0=[[$0]])
03-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x2]],
selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
{code}
Wrong result: sum with case returns 0.0 instead of NULL
{code}
0: jdbc:drill:schema=dfs> select
. . . . . . . . . . . . >         count(*)                                            
      as count_star,
. . . . . . . . . . . . >         sum(x2.c1)                                          
      as simle_sum,
. . . . . . . . . . . . >         round(avg(x2.c1))                                   
      as round_avg_x2_c1,
. . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12)
end) as sum_with_case
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         x1 left outer join x2 on x1.c1 = x2.c1
. . . . . . . . . . . . > group by
. . . . . . . . . . . . >         x1.c1;
+------------+------------+-----------------+---------------+
| count_star | simle_sum  | round_avg_x2_c1 | sum_with_case |
+------------+------------+-----------------+---------------+
| 1          | null       | null            | 0.0           |
+------------+------------+-----------------+---------------+
{code}

Query plan for the wrong result query:
{code}
00-01      Project(count_star=[$0], simle_sum=[$1], round_avg_x2_c1=[$2], sum_with_case=[$3])
00-02        UnionExchange
01-01          Project(count_star=[$1], simle_sum=[CASE(=($3, 0), null, $2)], round_avg_x2_c1=[ROUND(CAST(/(CastHigh(CASE(=($3,
0), null, $2)), $3)):ANY)], sum_with_case=[$4])
01-02            HashAgg(group=[{0}], count_star=[COUNT()], agg#1=[$SUM0($1)], agg#2=[COUNT($1)],
agg#3=[$SUM0($2)])
01-03              Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL,
ROUND(/($1, 12)))])
01-04                HashJoin(condition=[=($0, $1)], joinType=[left])
01-06                  HashToRandomExchange(dist0=[[$0]])
02-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x1]],
selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
01-05                  Project(c10=[$0])
01-07                    HashToRandomExchange(dist0=[[$0]])
03-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x2]],
selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
{code}

Cut/paste version of the query, comment out round function, query will return correct result.
{code:sql}
select
        count(*)     as count_star,                                
        sum(x2.c1)   as simle_sum,                                 
        round(avg(x2.c1))  as round_avg_x2_c1,                     
        sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case
from
        x1 left outer join x2 on x1.c1 = x2.c1
group by
        x1.c1;
{code}



> SUM with CASE statement on the column of the null producing side of left outer join returns
wrong result
> --------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-2247
>                 URL: https://issues.apache.org/jira/browse/DRILL-2247
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.8.0
>            Reporter: Victoria Markman
>            Assignee: Jinfeng Ni
>
> {code}
> 0: jdbc:drill:schema=dfs> select * from x1;
> +------------+
> |     c1     |
> +------------+
> | 1032.6516  |
> +------------+
> 1 row selected (0.093 seconds)
> 0: jdbc:drill:schema=dfs> select * from x2;
> +------------+
> |     c1     |
> +------------+
> | 1057.3117  |
> | 1090.8299  |
> +------------+
> 2 rows selected (0.085 seconds)
> {code}
> Correct result is NULL:
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >         sum(x2.c1)                                     
           as simple_sum,
> . . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12)
end) as sum_with_case
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . >         x1 left outer join x2 on x1.c1 = x2.c1
> . . . . . . . . . . . . > group by
> . . . . . . . . . . . . >         x1.c1;
> +------------+---------------+
> | simple_sum | sum_with_case |
> +------------+---------------+
> | 0.0000     | null          |
> +------------+---------------+
> 1 row selected (0.173 seconds)
> {code}
> Query plan for correct result:
> {code}
> 00-01      Project(simple_sum=[$0], sum_with_case=[$1])
> 00-02        UnionExchange
> 01-01          Project(simple_sum=[$1], sum_with_case=[$2])
> 01-02            HashAgg(group=[{0}], simple_sum=[SUM($1)], sum_with_case=[SUM($2)])
> 01-03              Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL,
ROUND(/($1, 12)))])
> 01-04                HashJoin(condition=[=($0, $1)], joinType=[left])
> 01-06                  HashToRandomExchange(dist0=[[$0]])
> 02-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/aggregation/x1]], selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
> 01-05                  Project(c10=[$0])
> 01-07                    HashToRandomExchange(dist0=[[$0]])
> 03-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/aggregation/x2]], selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
> {code}
> Wrong result: sum with case returns 0.0 instead of NULL
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >         count(*)                                       
           as count_star,
> . . . . . . . . . . . . >         sum(x2.c1)                                     
           as simle_sum,
> . . . . . . . . . . . . >         round(avg(x2.c1))                              
           as round_avg_x2_c1,
> . . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12)
end) as sum_with_case
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . >         x1 left outer join x2 on x1.c1 = x2.c1
> . . . . . . . . . . . . > group by
> . . . . . . . . . . . . >         x1.c1;
> +------------+------------+-----------------+---------------+
> | count_star | simle_sum  | round_avg_x2_c1 | sum_with_case |
> +------------+------------+-----------------+---------------+
> | 1          | null       | null            | 0.0           |
> +------------+------------+-----------------+---------------+
> {code}
> Query plan for the wrong result query:
> {code}
> 00-01      Project(count_star=[$0], simle_sum=[$1], round_avg_x2_c1=[$2], sum_with_case=[$3])
> 00-02        UnionExchange
> 01-01          Project(count_star=[$1], simle_sum=[CASE(=($3, 0), null, $2)], round_avg_x2_c1=[ROUND(CAST(/(CastHigh(CASE(=($3,
0), null, $2)), $3)):ANY)], sum_with_case=[$4])
> 01-02            HashAgg(group=[{0}], count_star=[COUNT()], agg#1=[$SUM0($1)], agg#2=[COUNT($1)],
agg#3=[$SUM0($2)])
> 01-03              Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL,
ROUND(/($1, 12)))])
> 01-04                HashJoin(condition=[=($0, $1)], joinType=[left])
> 01-06                  HashToRandomExchange(dist0=[[$0]])
> 02-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/aggregation/x1]], selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
> 01-05                  Project(c10=[$0])
> 01-07                    HashToRandomExchange(dist0=[[$0]])
> 03-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/aggregation/x2]], selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
> {code}
> Cut/paste version of the query, comment out round function, query will return correct
result.
> {code:sql}
> select
>         count(*)     as count_star,                                
>         sum(x2.c1)   as simle_sum,                                 
>         round(avg(x2.c1))  as round_avg_x2_c1,                     
>         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case
> from
>         x1 left outer join x2 on x1.c1 = x2.c1
> group by
>         x1.c1;
> {code}
> Not sure about how critical this bug is. Inconsistency is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message