Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 92FB617B2B for ; Sat, 10 Jan 2015 01:31:34 +0000 (UTC) Received: (qmail 8400 invoked by uid 500); 10 Jan 2015 01:31:35 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 8299 invoked by uid 500); 10 Jan 2015 01:31:35 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 8192 invoked by uid 99); 10 Jan 2015 01:31:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Jan 2015 01:31:35 +0000 Date: Sat, 10 Jan 2015 01:31:35 +0000 (UTC) From: "Victoria Markman (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (DRILL-1977) Wrong result with aggregation on top of UNION ALL operator MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Victoria Markman created DRILL-1977: --------------------------------------- Summary: Wrong result with aggregation on top of UNION ALL operator Key: DRILL-1977 URL: https://issues.apache.org/jira/browse/DRILL-1977 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 0.8.0 Reporter: Victoria Markman Assignee: Jinfeng Ni {code} Jan 10 01:17:13 root@~ ] # hadoop fs -cat /test/t.json { "a1": 0, "b1": 0, "c1": "true", "d1" : "2015-01-02"} { "a1": 0, "b1": 0, "c1": "false" , "d1" : "2015-01-03"} { "a1": 0, "b1": 0, "c1": "false" , "d1" : "2015-01-04"} { "a1": 1, "b1": 1, "c1": "true" , "d1" : "2015-01-05"} { "a1": 1, "b1": 1, "c1": "true" , "d1" : "2015-01-06"} {code} Query below returns wrong result, we should have 4 groups, instead we get only two groups from right side of the union. Notice lack of project step above union all (unclear if it has anything to to with the bug) {code} 0: jdbc:drill:schema=dfs> select calc1, max(a1), min(b1), count(c1) from (select a1+10 as calc1, a1, b1, c1 from `t.json` union all select a1+100 as calc1, a1, b1, c1 from `t.json`) group by calc1; +------------+------------+------------+------------+ | calc1 | EXPR$1 | EXPR$2 | EXPR$3 | +------------+------------+------------+------------+ | 100 | 0 | 0 | 3 | | 101 | 1 | 1 | 2 | +------------+------------+------------+------------+ 2 rows selected (0.181 seconds) 00-01 Project(calc1=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3]) 00-02 HashAgg(group=[{0}], EXPR$1=[MAX($1)], EXPR$2=[MIN($2)], EXPR$3=[COUNT($3)]) 00-03 UnionAll(all=[true]) 00-05 Project(calc1=[+($2, 10)], a1=[$2], b1=[$1], c1=[$0]) 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]]) 00-04 Project(calc1=[+($2, 100)], a1=[$2], b1=[$1], c1=[$0]) 00-06 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]]) {code} Correct result, when order of the columns is different in aggregate functions: "min(b1), max(a1)" instead of "max(a1), min(b1)" {code} 0: jdbc:drill:schema=dfs> select calc1, min(b1), max(a1), count(c1) from (select a1+10 as calc1, a1, b1, c1 from `t.json` union all select a1+100 as calc1, a1, b1, c1 from `t.json`) group by calc1; +------------+------------+------------+------------+ | calc1 | EXPR$1 | EXPR$2 | EXPR$3 | +------------+------------+------------+------------+ | 10 | 0 | 0 | 3 | | 11 | 1 | 1 | 2 | | 100 | 0 | 0 | 3 | | 101 | 1 | 1 | 2 | +------------+------------+------------+------------+ 4 rows selected (0.256 seconds) 00-01 Project(calc1=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3]) 00-02 HashAgg(group=[{0}], EXPR$1=[MIN($1)], EXPR$2=[MAX($2)], EXPR$3=[COUNT($3)]) 00-03 Project(calc1=[$0], b1=[$2], a1=[$1], c1=[$3]) 00-04 UnionAll(all=[true]) 00-06 Project(calc1=[+($2, 10)], a1=[$2], b1=[$1], c1=[$0]) 00-08 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]]) 00-05 Project(calc1=[+($2, 100)], a1=[$2], b1=[$1], c1=[$0]) 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]]){code} Non-union query works correctly: {code} 0: jdbc:drill:schema=dfs> select a1+10 as calc1, min(a1), max(b1), count(c1) from `t.json` group by a1+10; +------------+------------+------------+------------+ | calc1 | EXPR$1 | EXPR$2 | EXPR$3 | +------------+------------+------------+------------+ | 10 | 0 | 0 | 3 | | 11 | 1 | 1 | 2 | +------------+------------+------------+------------+ 2 rows selected (0.142 seconds) 0: jdbc:drill:schema=dfs> select a1+10 as calc1, min(b1), max(a1), count(c1) from `t.json` group by a1+10; +------------+------------+------------+------------+ | calc1 | EXPR$1 | EXPR$2 | EXPR$3 | +------------+------------+------------+------------+ | 10 | 0 | 0 | 3 | | 11 | 1 | 1 | 2 | +------------+------------+------------+------------+ 2 rows selected (0.153 seconds) {code} If I switch legs of a UNION, we still get wrong result, but from the right side: {code} 0: jdbc:drill:schema=dfs> select calc1, max(a1), min(b1), count(c1) from (select a1+100 as calc1, a1, b1, c1 from `t.json` union all select a1+10 as calc1, a1, b1, c1 from `t.json`) group by calc1; +------------+------------+------------+------------+ | calc1 | EXPR$1 | EXPR$2 | EXPR$3 | +------------+------------+------------+------------+ | 10 | 0 | 0 | 3 | | 11 | 1 | 1 | 2 | +------------+------------+------------+------------+ 2 rows selected (0.19 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)