Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8C84617DAD for ; Tue, 3 Feb 2015 22:02:50 +0000 (UTC) Received: (qmail 30657 invoked by uid 500); 3 Feb 2015 22:02:35 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 30629 invoked by uid 500); 3 Feb 2015 22:02:35 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 30617 invoked by uid 99); 3 Feb 2015 22:02:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Feb 2015 22:02:35 +0000 Date: Tue, 3 Feb 2015 22:02:35 +0000 (UTC) From: "Jacques Nadeau (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (DRILL-1977) Wrong result with aggregation on top of UNION ALL operator MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-1977: ---------------------------------- Fix Version/s: 0.9.0 > Wrong result with aggregation on top of UNION ALL operator > ---------------------------------------------------------- > > Key: DRILL-1977 > URL: https://issues.apache.org/jira/browse/DRILL-1977 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 0.8.0 > Reporter: Victoria Markman > Assignee: Jinfeng Ni > Priority: Critical > Fix For: 0.9.0 > > > {code} > Jan 10 01:17:13 root@~ ] # hadoop fs -cat /test/t.json > { "a1": 0, "b1": 0, "c1": "true", "d1" : "2015-01-02"} > { "a1": 0, "b1": 0, "c1": "false" , "d1" : "2015-01-03"} > { "a1": 0, "b1": 0, "c1": "false" , "d1" : "2015-01-04"} > { "a1": 1, "b1": 1, "c1": "true" , "d1" : "2015-01-05"} > { "a1": 1, "b1": 1, "c1": "true" , "d1" : "2015-01-06"} > {code} > Query below returns wrong result, we should have 4 groups, instead we get only two groups from right side of the union. > Notice lack of project step above union all (unclear if it has anything to to with the bug) > {code} > 0: jdbc:drill:schema=dfs> select calc1, max(a1), min(b1), count(c1) from (select a1+10 as calc1, a1, b1, c1 from `t.json` union all select a1+100 as calc1, a1, b1, c1 from `t.json`) group by calc1; > +------------+------------+------------+------------+ > | calc1 | EXPR$1 | EXPR$2 | EXPR$3 | > +------------+------------+------------+------------+ > | 100 | 0 | 0 | 3 | > | 101 | 1 | 1 | 2 | > +------------+------------+------------+------------+ > 2 rows selected (0.181 seconds) > 00-01 Project(calc1=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3]) > 00-02 HashAgg(group=[{0}], EXPR$1=[MAX($1)], EXPR$2=[MIN($2)], EXPR$3=[COUNT($3)]) > 00-03 UnionAll(all=[true]) > 00-05 Project(calc1=[+($2, 10)], a1=[$2], b1=[$1], c1=[$0]) > 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]]) > 00-04 Project(calc1=[+($2, 100)], a1=[$2], b1=[$1], c1=[$0]) > 00-06 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]]) > {code} > Correct result, when order of the columns is different in aggregate functions: "min(b1), max(a1)" instead of "max(a1), min(b1)" > {code} > 0: jdbc:drill:schema=dfs> select calc1, min(b1), max(a1), count(c1) from (select a1+10 as calc1, a1, b1, c1 from `t.json` union all select a1+100 as calc1, a1, b1, c1 from `t.json`) group by calc1; > +------------+------------+------------+------------+ > | calc1 | EXPR$1 | EXPR$2 | EXPR$3 | > +------------+------------+------------+------------+ > | 10 | 0 | 0 | 3 | > | 11 | 1 | 1 | 2 | > | 100 | 0 | 0 | 3 | > | 101 | 1 | 1 | 2 | > +------------+------------+------------+------------+ > 4 rows selected (0.256 seconds) > 00-01 Project(calc1=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3]) > 00-02 HashAgg(group=[{0}], EXPR$1=[MIN($1)], EXPR$2=[MAX($2)], EXPR$3=[COUNT($3)]) > 00-03 Project(calc1=[$0], b1=[$2], a1=[$1], c1=[$3]) > 00-04 UnionAll(all=[true]) > 00-06 Project(calc1=[+($2, 10)], a1=[$2], b1=[$1], c1=[$0]) > 00-08 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]]) > 00-05 Project(calc1=[+($2, 100)], a1=[$2], b1=[$1], c1=[$0]) > 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]]){code} > Non-union query works correctly: > {code} > 0: jdbc:drill:schema=dfs> select a1+10 as calc1, min(a1), max(b1), count(c1) from `t.json` group by a1+10; > +------------+------------+------------+------------+ > | calc1 | EXPR$1 | EXPR$2 | EXPR$3 | > +------------+------------+------------+------------+ > | 10 | 0 | 0 | 3 | > | 11 | 1 | 1 | 2 | > +------------+------------+------------+------------+ > 2 rows selected (0.142 seconds) > 0: jdbc:drill:schema=dfs> select a1+10 as calc1, min(b1), max(a1), count(c1) from `t.json` group by a1+10; > +------------+------------+------------+------------+ > | calc1 | EXPR$1 | EXPR$2 | EXPR$3 | > +------------+------------+------------+------------+ > | 10 | 0 | 0 | 3 | > | 11 | 1 | 1 | 2 | > +------------+------------+------------+------------+ > 2 rows selected (0.153 seconds) > {code} > If I switch legs of a UNION, we still get wrong result, but from the right side: > {code} > 0: jdbc:drill:schema=dfs> select calc1, max(a1), min(b1), count(c1) from (select a1+100 as calc1, a1, b1, c1 from `t.json` union all select a1+10 as calc1, a1, b1, c1 from `t.json`) group by calc1; > +------------+------------+------------+------------+ > | calc1 | EXPR$1 | EXPR$2 | EXPR$3 | > +------------+------------+------------+------------+ > | 10 | 0 | 0 | 3 | > | 11 | 1 | 1 | 2 | > +------------+------------+------------+------------+ > 2 rows selected (0.19 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)