Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1FEBD18179 for ; Thu, 2 Jul 2015 14:21:05 +0000 (UTC) Received: (qmail 35644 invoked by uid 500); 2 Jul 2015 14:21:04 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 35614 invoked by uid 500); 2 Jul 2015 14:21:04 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 35603 invoked by uid 99); 2 Jul 2015 14:21:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jul 2015 14:21:04 +0000 Date: Thu, 2 Jul 2015 14:21:04 +0000 (UTC) From: "Aman Sinha (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-2247) SUM with CASE statement on the column of the null producing side of left outer join returns wrong result MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612007#comment-14612007 ] Aman Sinha commented on DRILL-2247: ----------------------------------- There are a few components of this query that make it difficult to analyze. The first thing would be to do the left outer join and aggregate functions on an integer/bigint column instead of floating point. You can still use the round() function since the result of avg() or the division would be floating point. Doing joins on floating point numbers is going to be imprecise (decimal is fine..when we support it). > SUM with CASE statement on the column of the null producing side of left outer join returns wrong result > -------------------------------------------------------------------------------------------------------- > > Key: DRILL-2247 > URL: https://issues.apache.org/jira/browse/DRILL-2247 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 0.8.0 > Reporter: Victoria Markman > Assignee: Jinfeng Ni > Fix For: 1.2.0 > > Attachments: x1.parquet, x2.parquet > > > {code} > 0: jdbc:drill:schema=dfs> select * from x1; > +------------+ > | c1 | > +------------+ > | 1032.6516 | > +------------+ > 1 row selected (0.093 seconds) > 0: jdbc:drill:schema=dfs> select * from x2; > +------------+ > | c1 | > +------------+ > | 1057.3117 | > | 1090.8299 | > +------------+ > 2 rows selected (0.085 seconds) > {code} > Correct result is NULL: > {code} > 0: jdbc:drill:schema=dfs> select > . . . . . . . . . . . . > sum(x2.c1) as simple_sum, > . . . . . . . . . . . . > sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case > . . . . . . . . . . . . > from > . . . . . . . . . . . . > x1 left outer join x2 on x1.c1 = x2.c1 > . . . . . . . . . . . . > group by > . . . . . . . . . . . . > x1.c1; > +------------+---------------+ > | simple_sum | sum_with_case | > +------------+---------------+ > | 0.0000 | null | > +------------+---------------+ > 1 row selected (0.173 seconds) > {code} > Query plan for correct result: > {code} > 00-01 Project(simple_sum=[$0], sum_with_case=[$1]) > 00-02 UnionExchange > 01-01 Project(simple_sum=[$1], sum_with_case=[$2]) > 01-02 HashAgg(group=[{0}], simple_sum=[SUM($1)], sum_with_case=[SUM($2)]) > 01-03 Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL, ROUND(/($1, 12)))]) > 01-04 HashJoin(condition=[=($0, $1)], joinType=[left]) > 01-06 HashToRandomExchange(dist0=[[$0]]) > 02-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x1]], selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]]) > 01-05 Project(c10=[$0]) > 01-07 HashToRandomExchange(dist0=[[$0]]) > 03-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x2]], selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]]) > {code} > Wrong result: sum with case returns 0.0 instead of NULL > {code} > 0: jdbc:drill:schema=dfs> select > . . . . . . . . . . . . > count(*) as count_star, > . . . . . . . . . . . . > sum(x2.c1) as simle_sum, > . . . . . . . . . . . . > round(avg(x2.c1)) as round_avg_x2_c1, > . . . . . . . . . . . . > sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case > . . . . . . . . . . . . > from > . . . . . . . . . . . . > x1 left outer join x2 on x1.c1 = x2.c1 > . . . . . . . . . . . . > group by > . . . . . . . . . . . . > x1.c1; > +------------+------------+-----------------+---------------+ > | count_star | simle_sum | round_avg_x2_c1 | sum_with_case | > +------------+------------+-----------------+---------------+ > | 1 | null | null | 0.0 | > +------------+------------+-----------------+---------------+ > {code} > Query plan for the wrong result query: > {code} > 00-01 Project(count_star=[$0], simle_sum=[$1], round_avg_x2_c1=[$2], sum_with_case=[$3]) > 00-02 UnionExchange > 01-01 Project(count_star=[$1], simle_sum=[CASE(=($3, 0), null, $2)], round_avg_x2_c1=[ROUND(CAST(/(CastHigh(CASE(=($3, 0), null, $2)), $3)):ANY)], sum_with_case=[$4]) > 01-02 HashAgg(group=[{0}], count_star=[COUNT()], agg#1=[$SUM0($1)], agg#2=[COUNT($1)], agg#3=[$SUM0($2)]) > 01-03 Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL, ROUND(/($1, 12)))]) > 01-04 HashJoin(condition=[=($0, $1)], joinType=[left]) > 01-06 HashToRandomExchange(dist0=[[$0]]) > 02-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x1]], selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]]) > 01-05 Project(c10=[$0]) > 01-07 HashToRandomExchange(dist0=[[$0]]) > 03-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x2]], selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]]) > {code} > Cut/paste version of the query, comment out round function, query will return correct result. > {code:sql} > select > count(*) as count_star, > sum(x2.c1) as simle_sum, > round(avg(x2.c1)) as round_avg_x2_c1, > sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case > from > x1 left outer join x2 on x1.c1 = x2.c1 > group by > x1.c1; > {code} > Not sure about how critical this bug is. Inconsistency is bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)