drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2247) SUM with CASE statement on the column of the null producing side of left outer join returns wrong result
Date Thu, 02 Jul 2015 14:21:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612007#comment-14612007
] 

Aman Sinha commented on DRILL-2247:
-----------------------------------

There are a few components of this query that make it difficult to analyze.  The first thing
would be to do the left outer join and aggregate functions on an integer/bigint column instead
of floating point.   You can still use the round() function since the result of avg() or the
division would be floating point.   Doing joins on floating point numbers is going to be imprecise
(decimal is fine..when we support it). 

> SUM with CASE statement on the column of the null producing side of left outer join returns
wrong result
> --------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-2247
>                 URL: https://issues.apache.org/jira/browse/DRILL-2247
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.8.0
>            Reporter: Victoria Markman
>            Assignee: Jinfeng Ni
>             Fix For: 1.2.0
>
>         Attachments: x1.parquet, x2.parquet
>
>
> {code}
> 0: jdbc:drill:schema=dfs> select * from x1;
> +------------+
> |     c1     |
> +------------+
> | 1032.6516  |
> +------------+
> 1 row selected (0.093 seconds)
> 0: jdbc:drill:schema=dfs> select * from x2;
> +------------+
> |     c1     |
> +------------+
> | 1057.3117  |
> | 1090.8299  |
> +------------+
> 2 rows selected (0.085 seconds)
> {code}
> Correct result is NULL:
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >         sum(x2.c1)                                     
           as simple_sum,
> . . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12)
end) as sum_with_case
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . >         x1 left outer join x2 on x1.c1 = x2.c1
> . . . . . . . . . . . . > group by
> . . . . . . . . . . . . >         x1.c1;
> +------------+---------------+
> | simple_sum | sum_with_case |
> +------------+---------------+
> | 0.0000     | null          |
> +------------+---------------+
> 1 row selected (0.173 seconds)
> {code}
> Query plan for correct result:
> {code}
> 00-01      Project(simple_sum=[$0], sum_with_case=[$1])
> 00-02        UnionExchange
> 01-01          Project(simple_sum=[$1], sum_with_case=[$2])
> 01-02            HashAgg(group=[{0}], simple_sum=[SUM($1)], sum_with_case=[SUM($2)])
> 01-03              Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL,
ROUND(/($1, 12)))])
> 01-04                HashJoin(condition=[=($0, $1)], joinType=[left])
> 01-06                  HashToRandomExchange(dist0=[[$0]])
> 02-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/aggregation/x1]], selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
> 01-05                  Project(c10=[$0])
> 01-07                    HashToRandomExchange(dist0=[[$0]])
> 03-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/aggregation/x2]], selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
> {code}
> Wrong result: sum with case returns 0.0 instead of NULL
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >         count(*)                                       
           as count_star,
> . . . . . . . . . . . . >         sum(x2.c1)                                     
           as simle_sum,
> . . . . . . . . . . . . >         round(avg(x2.c1))                              
           as round_avg_x2_c1,
> . . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12)
end) as sum_with_case
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . >         x1 left outer join x2 on x1.c1 = x2.c1
> . . . . . . . . . . . . > group by
> . . . . . . . . . . . . >         x1.c1;
> +------------+------------+-----------------+---------------+
> | count_star | simle_sum  | round_avg_x2_c1 | sum_with_case |
> +------------+------------+-----------------+---------------+
> | 1          | null       | null            | 0.0           |
> +------------+------------+-----------------+---------------+
> {code}
> Query plan for the wrong result query:
> {code}
> 00-01      Project(count_star=[$0], simle_sum=[$1], round_avg_x2_c1=[$2], sum_with_case=[$3])
> 00-02        UnionExchange
> 01-01          Project(count_star=[$1], simle_sum=[CASE(=($3, 0), null, $2)], round_avg_x2_c1=[ROUND(CAST(/(CastHigh(CASE(=($3,
0), null, $2)), $3)):ANY)], sum_with_case=[$4])
> 01-02            HashAgg(group=[{0}], count_star=[COUNT()], agg#1=[$SUM0($1)], agg#2=[COUNT($1)],
agg#3=[$SUM0($2)])
> 01-03              Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL,
ROUND(/($1, 12)))])
> 01-04                HashJoin(condition=[=($0, $1)], joinType=[left])
> 01-06                  HashToRandomExchange(dist0=[[$0]])
> 02-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/aggregation/x1]], selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
> 01-05                  Project(c10=[$0])
> 01-07                    HashToRandomExchange(dist0=[[$0]])
> 03-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/aggregation/x2]], selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
> {code}
> Cut/paste version of the query, comment out round function, query will return correct
result.
> {code:sql}
> select
>         count(*)     as count_star,                                
>         sum(x2.c1)   as simle_sum,                                 
>         round(avg(x2.c1))  as round_avg_x2_c1,                     
>         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case
> from
>         x1 left outer join x2 on x1.c1 = x2.c1
> group by
>         x1.c1;
> {code}
> Not sure about how critical this bug is. Inconsistency is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message