Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D1AB183E9 for ; Wed, 9 Mar 2016 00:16:41 +0000 (UTC) Received: (qmail 3083 invoked by uid 500); 9 Mar 2016 00:16:41 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 2955 invoked by uid 500); 9 Mar 2016 00:16:41 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 2294 invoked by uid 99); 9 Mar 2016 00:16:41 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Mar 2016 00:16:41 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id E7B422C1F68 for ; Wed, 9 Mar 2016 00:16:40 +0000 (UTC) Date: Wed, 9 Mar 2016 00:16:40 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-4474?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1518= 6137#comment-15186137 ]=20 ASF GitHub Bot commented on DRILL-4474: --------------------------------------- GitHub user jinfengni opened a pull request: https://github.com/apache/drill/pull/416 DRILL-4474: Ensure that ConvertCountToDirectScan does not push through = project when nullable input of count is not RexInputRef =20 You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinfengni/incubator-drill review/DRILL-44= 74 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/416.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #416 =20 ---- commit 0a5f8fab786f931665d9d28ea67cf19ab37c07fb Author: Jacques Nadeau Date: 2016-03-04T21:27:26Z DRILL-4474: Ensure that ConvertCountToDirectScan only pushes through pr= oject when project is trivial. commit ab00e6aa9563d79e62154ba1f3bbb71dba7d8036 Author: Jinfeng Ni Date: 2016-03-08T22:15:27Z DRILL-4474: Ensure that ConvertCountToDirectScan does not push through = project when nullable input of count is not RexInputRef ---- > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > ---------------------------------------------------------------------- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 > Reporter: Shankar > Assignee: Jacques Nadeau > Priority: Blocker > > {quote} > * We are using drill to retrieve the business data from game analytic.=20 > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT functi= on. > * Below is the case by case queries and their output. {color:blue}*Please= analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been = already created).=20 > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= -- > CASE-1 (Wrong result) > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select =20 > . . . . . . . > count(case when t.id =3D '/confirmDrop/btnYes/' and t.eve= nt =3D 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ;=20 > +-----------+ > | count | > +-----------+ > | 27645752 | > +-----------+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= -- > CASE-2 (Wrong result) > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select =20 > . . . . . . . > count(sessionid),=20 > . . . . . . . > count(case when t.id =3D '/confirmDrop/btnYes/' and t.eve= nt =3D 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ;=20 > +-----------+-------+ > | EXPR$0 | cnt | > +-----------+-------+ > | 37772844 | 2108 | > +-----------+-------+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= -- > CASE-3 (Wrong result, only first count is correct) > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select =20 > . . . . . . . > count(distinct sessionid),=20 > . . . . . . . > count(case when t.id =3D '/confirmDrop/btnYes/' and t.eve= nt =3D 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ;=20 > +---------+-----------+ > | EXPR$0 | cnt | > +---------+-----------+ > | 201941 | 37772844 | > +---------+-----------+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= -- > CASE-4 (Correct result) > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select =20 > . . . . . . . > count(distinct case when t.id =3D '/confirmDrop/btnYes/' = and t.event =3D 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ;=20 > +------+ > | cnt | > +------+ > | 525 | > +------+ > 1 row selected (14.318 seconds) > {noformat} > {quote} > {color} > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= -- > CASE-5 (Correct result) > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select =20 > . . . . . . . > count(sessionid), > . . . . . . . > count(distinct sessionid) > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > where ( t.id =3D '/confirmDrop/btnYes/' and t.event =3D '= Click') > . . . . . . . > ; > +---------+---------+ > | EXPR$0 | EXPR$1 | > +---------+---------+ > | 2108 | 525 | > +---------+---------+ > 1 row selected (19.355 seconds) > {noformat} > {quote} > {color} > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= -- > CASE-6 > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= -- > {quote} > {noformat} > 0: jdbc:drill:> explain plan for=20 > . . . . . . . >=20 > . . . . . . . > select =20 > . . . . . . . > count(case when t.id =3D '/confirmDrop/btnYes/' and t.eve= nt =3D 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ;=20 > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 Scan(groupscan=3D[org.apache.drill.exec.store.pojo.PojoRecordR= eader@73ff10e1]) > | { > "head" : { > "version" : 1, > "generator" : { > "type" : "ExplainHandler", > "info" : "" > }, > "type" : "APACHE_DRILL_PHYSICAL", > "options" : [ ], > "queue" : 0, > "resultMode" : "EXEC" > }, > "graph" : [ { > "pop" : "DirectGroupScan", > "@id" : 1, > "cost" : 20.0 > }, { > "pop" : "screen", > "@id" : 0, > "child" : 1, > "initialAllocation" : 1000000, > "maxAllocation" : 10000000000, > "cost" : 20.0 > } ] > } | > +------+------+ > 1 row selected (0.276 seconds) > {noformat} > {quote} > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= -- > CASE-7 > -------------------------------------------------------------------------= ---------------------------------------------------------------------------= -- > {quote} > {noformat} > 0: jdbc:drill:>=20 > 0: jdbc:drill:> explain plan for=20 > . . . . . . . >=20 > . . . . . . . > select =20 > . . . . . . . > count(distinct sessionid),=20 > . . . . . . . > count(case when t.id =3D '/confirmDrop/btnYes/' and t.eve= nt =3D 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ;=20 > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 Project(EXPR$0=3D[$0], cnt=3D[$1]) > 00-02 Project(EXPR$0=3D[$1], cnt=3D[$0]) > 00-03 NestedLoopJoin(condition=3D[true], joinType=3D[inner]) > 00-05 Scan(groupscan=3D[org.apache.drill.exec.store.pojo.PojoR= ecordReader@59b7cda9]) > 00-04 StreamAgg(group=3D[{}], EXPR$0=3D[$SUM0($0)]) > 00-06 UnionExchange > 01-01 StreamAgg(group=3D[{}], EXPR$0=3D[COUNT($0)]) > 01-02 HashAgg(group=3D[{0}]) > 01-03 Project(sessionid=3D[$0]) > 01-04 HashToRandomExchange(dist0=3D[[$0]]) > 02-01 UnorderedMuxExchange > 03-01 Project(sessionid=3D[$0], E_X_P_R_H_A_S_H_= F_I_E_L_D=3D[castInt(hash64AsDouble($0))]) > 03-02 HashAgg(group=3D[{0}]) > 03-03 Scan(groupscan=3D[ParquetGroupScan [en= tries=3D[ReadEntryWithPath [path=3Dhdfs://namenode:9000/tmp/a_games_log_vis= it_base]], selectionRoot=3Dhdfs://namenode:9000/tmp/a_games_log_visit_base,= numFiles=3D1, usedMetadataFile=3Dfalse, columns=3D[`sessionid`]]]) > | { > "head" : { > "version" : 1, > "generator" : { > "type" : "ExplainHandler", > "info" : "" > }, > "type" : "APACHE_DRILL_PHYSICAL", > "options" : [ ], > "queue" : 0, > "resultMode" : "EXEC" > }, > "graph" : [ { > "pop" : "DirectGroupScan", > "@id" : 5, > "cost" : 20.0 > }, { > "pop" : "parquet-scan", > "@id" : 196611, > "userName" : "hadoop", > "entries" : [ { > "path" : "hdfs://namenode:9000/tmp/a_games_log_visit_base" > } ], > "storage" : { > "type" : "file", > "enabled" : true, > "connection" : "hdfs://namenode:9000", > "workspaces" : { > "root" : { > "location" : "/tmp/", > "writable" : true, > "defaultInputFormat" : null > }, > "tmp" : { > "location" : "/tmp", > "writable" : true, > "defaultInputFormat" : null > } > }, > "formats" : { > "psv" : { > "type" : "text", > "extensions" : [ "tbl" ], > "delimiter" : "|" > }, > "csv" : { > "type" : "text", > "extensions" : [ "csv" ], > "delimiter" : "," > }, > "tsv" : { > "type" : "text", > "extensions" : [ "tsv" ], > "delimiter" : "\t" > }, > "parquet" : { > "type" : "parquet" > }, > "json" : { > "type" : "json" > }, > "avro" : { > "type" : "avro" > } > } > }, > "format" : { > "type" : "parquet" > }, > "columns" : [ "`sessionid`" ], > "selectionRoot" : "hdfs://namenode:9000/tmp/a_games_log_visit_base", > "fileSet" : [ "/tmp/a_games_log_visit_base/1_6_15.parquet", "/tmp/a_g= ames_log_visit_base/1_1_9.parquet", "/tmp/a_games_log_visit_base/1_7_2.parq= uet", "/tmp/a_games_log_visit_base/1_5_7.parquet", "/tmp/a_games_log_visit_= base/1_2_1.parquet", "/tmp/a_games_log_visit_base/1_4_23.parquet", "/tmp/a_= games_log_visit_base/1_8_22.parquet", "/tmp/a_games_log_visit_base/1_3_11.p= arquet", "/tmp/a_games_log_visit_base/1_6_30.parquet", "/tmp/a_games_log_vi= sit_base/1_7_10.parquet", "/tmp/a_games_log_visit_base/1_1_10.parquet", "/t= mp/a_games_log_visit_base/1_0_24.parquet", "/tmp/a_games_log_visit_base/1_8= _39.parquet", "/tmp/a_games_log_visit_base/1_2_16.parquet", "/tmp/a_games_l= og_visit_base/1_7_27.parquet", "/tmp/a_games_log_visit_base/1_3_4.parquet",= "/tmp/a_games_log_visit_base/1_8_5.parquet", "/tmp/a_games_log_visit_base/= 1_8_13.parquet", "/tmp/a_games_log_visit_base/1_4_7.parquet", "/tmp/a_games= _log_visit_base/1_5_12.parquet", "/tmp/a_games_log_visit_base/1_0_9.parquet= ", "/tmp/a_games_log_visit_base/1_4_14.parquet", "/tmp/a_games_log_visit_ba= se/1_2_13.parquet", "/tmp/a_games_log_visit_base/1_0_15.parquet", "/tmp/a_g= ames_log_visit_base/1_2_4.parquet", "/tmp/a_games_log_visit_base/1_6_24.par= quet", "/tmp/a_games_log_visit_base/1_7_5.parquet", "/tmp/a_games_log_visit= _base/1_6_2.parquet", "/tmp/a_games_log_visit_base/1_1_13.parquet", "/tmp/a= _games_log_visit_base/1_1_1.parquet", "/tmp/a_games_log_visit_base/1_3_7.pa= rquet", "/tmp/a_games_log_visit_base/1_0_12.parquet", "/tmp/a_games_log_vis= it_base/1_0_3.parquet", "/tmp/a_games_log_visit_base/1_0_29.parquet", "/tmp= /a_games_log_visit_base/1_1_24.parquet", "/tmp/a_games_log_visit_base/1_6_1= 8.parquet", "/tmp/a_games_log_visit_base/1_2_19.parquet", "/tmp/a_games_log= _visit_base/1_3_16.parquet", "/tmp/a_games_log_visit_base/1_6_27.parquet", = "/tmp/a_games_log_visit_base/1_4_11.parquet", "/tmp/a_games_log_visit_base/= 1_7_15.parquet", "/tmp/a_games_log_visit_base/1_5_2.parquet", "/tmp/a_games= _log_visit_base/1_8_10.parquet", "/tmp/a_games_log_visit_base/1_6_7.parquet= ", "/tmp/a_games_log_visit_base/1_8_2.parquet", "/tmp/a_games_log_visit_bas= e/1_2_9.parquet", "/tmp/a_games_log_visit_base/1_7_18.parquet", "/tmp/a_gam= es_log_visit_base/1_3_20.parquet", "/tmp/a_games_log_visit_base/1_7_31.parq= uet", "/tmp/a_games_log_visit_base/1_1_16.parquet", "/tmp/a_games_log_visit= _base/1_6_35.parquet", "/tmp/a_games_log_visit_base/1_1_4.parquet", "/tmp/a= _games_log_visit_base/1_4_19.parquet", "/tmp/a_games_log_visit_base/1_0_6.p= arquet", "/tmp/a_games_log_visit_base/1_8_18.parquet", "/tmp/a_games_log_vi= sit_base/1_7_8.parquet", "/tmp/a_games_log_visit_base/1_5_15.parquet", "/tm= p/a_games_log_visit_base/1_4_20.parquet", "/tmp/a_games_log_visit_base/1_0_= 21.parquet", "/tmp/a_games_log_visit_base/1_8_31.parquet", "/tmp/a_games_lo= g_visit_base/1_4_4.parquet", "/tmp/a_games_log_visit_base/1_3_19.parquet", = "/tmp/a_games_log_visit_base/1_1_21.parquet", "/tmp/a_games_log_visit_base/= 1_2_18.parquet", "/tmp/a_games_log_visit_base/1_8_20.parquet", "/tmp/a_game= s_log_visit_base/1_8_37.parquet", "/tmp/a_games_log_visit_base/1_0_13.parqu= et", "/tmp/a_games_log_visit_base/1_0_4.parquet", "/tmp/a_games_log_visit_b= ase/1_6_13.parquet", "/tmp/a_games_log_visit_base/1_0_22.parquet", "/tmp/a_= games_log_visit_base/1_1_19.parquet", "/tmp/a_games_log_visit_base/1_3_2.pa= rquet", "/tmp/a_games_log_visit_base/1_7_29.parquet", "/tmp/a_games_log_vis= it_base/1_5_1.parquet", "/tmp/a_games_log_visit_base/1_8_11.parquet", "/tmp= /a_games_log_visit_base/1_4_25.parquet", "/tmp/a_games_log_visit_base/1_6_3= 2.parquet", "/tmp/a_games_log_visit_base/1_7_12.parquet", "/tmp/a_games_log= _visit_base/1_7_0.parquet", "/tmp/a_games_log_visit_base/1_6_4.parquet", "/= tmp/a_games_log_visit_base/1_6_22.parquet", "/tmp/a_games_log_visit_base/1_= 4_5.parquet", "/tmp/a_games_log_visit_base/1_8_3.parquet", "/tmp/a_games_lo= g_visit_base/1_1_11.parquet", "/tmp/a_games_log_visit_base/1_1_3.parquet", = "/tmp/a_games_log_visit_base/1_8_34.parquet", "/tmp/a_games_log_visit_base/= 1_2_6.parquet", "/tmp/a_games_log_visit_base/1_5_14.parquet", "/tmp/a_games= _log_visit_base/1_0_7.parquet", "/tmp/a_games_log_visit_base/1_4_16.parquet= ", "/tmp/a_games_log_visit_base/1_6_10.parquet", "/tmp/a_games_log_visit_ba= se/1_7_3.parquet", "/tmp/a_games_log_visit_base/1_0_27.parquet", "/tmp/a_ga= mes_log_visit_base/1_0_1.parquet", "/tmp/a_games_log_visit_base/1_0_10.parq= uet", "/tmp/a_games_log_visit_base/1_5_4.parquet", "/tmp/a_games_log_visit_= base/1_8_19.parquet", "/tmp/a_games_log_visit_base/1_4_22.parquet", "/tmp/a= _games_log_visit_base/1_7_24.parquet", "/tmp/a_games_log_visit_base/1_8_8.p= arquet", "/tmp/a_games_log_visit_base/1_8_25.parquet", "/tmp/a_games_log_vi= sit_base/1_6_9.parquet", "/tmp/a_games_log_visit_base/1_5_17.parquet", "/tm= p/a_games_log_visit_base/1_3_5.parquet", "/tmp/a_games_log_visit_base/1_7_3= 0.parquet", "/tmp/a_games_log_visit_base/1_3_14.parquet", "/tmp/a_games_log= _visit_base/1_1_6.parquet", "/tmp/a_games_log_visit_base/1_6_16.parquet", "= /tmp/a_games_log_visit_base/1_2_0.parquet", "/tmp/a_games_log_visit_base/1_= 5_21.parquet", "/tmp/a_games_log_visit_base/1_1_14.parquet", "/tmp/a_games_= log_visit_base/1_7_33.parquet", "/tmp/a_games_log_visit_base/1_8_28.parquet= ", "/tmp/a_games_log_visit_base/1_0_18.parquet", "/tmp/a_games_log_visit_ba= se/1_4_13.parquet", "/tmp/a_games_log_visit_base/1_7_21.parquet", "/tmp/a_g= ames_log_visit_base/1_2_3.parquet", "/tmp/a_games_log_visit_base/1_3_17.par= quet", "/tmp/a_games_log_visit_base/1_4_2.parquet", "/tmp/a_games_log_visit= _base/1_8_16.parquet", "/tmp/a_games_log_visit_base/1_6_25.parquet", "/tmp/= a_games_log_visit_base/1_8_0.parquet", "/tmp/a_games_log_visit_base/1_2_10.= parquet", "/tmp/a_games_log_visit_base/1_6_1.parquet", "/tmp/a_games_log_vi= sit_base/1_7_6.parquet", "/tmp/a_games_log_visit_base/1_3_8.parquet", "/tmp= /a_games_log_visit_base/1_8_9.parquet", "/tmp/a_games_log_visit_base/1_3_15= .parquet", "/tmp/a_games_log_visit_base/1_7_23.parquet", "/tmp/a_games_log_= visit_base/1_6_28.parquet", "/tmp/a_games_log_visit_base/1_4_0.parquet", "/= tmp/a_games_log_visit_base/1_4_27.parquet", "/tmp/a_games_log_visit_base/1_= 8_35.parquet", "/tmp/a_games_log_visit_base/1_6_19.parquet", "/tmp/a_games_= log_visit_base/1_0_28.parquet", "/tmp/a_games_log_visit_base/1_5_16.parquet= ", "/tmp/a_games_log_visit_base/1_7_14.parquet", "/tmp/a_games_log_visit_ba= se/1_5_22.parquet", "/tmp/a_games_log_visit_base/1_8_26.parquet", "/tmp/a_g= ames_log_visit_base/1_6_34.parquet", "/tmp/a_games_log_visit_base/1_1_5.par= quet", "/tmp/a_games_log_visit_base/1_1_23.parquet", "/tmp/a_games_log_visi= t_base/1_6_6.parquet", "/tmp/a_games_log_visit_base/1_5_3.parquet", "/tmp/a= _games_log_visit_base/1_0_11.parquet", "/tmp/a_games_log_visit_base/1_0_2.p= arquet", "/tmp/a_games_log_visit_base/1_4_10.parquet", "/tmp/a_games_log_vi= sit_base/1_2_8.parquet", "/tmp/a_games_log_visit_base/1_0_19.parquet", "/tm= p/a_games_log_visit_base/1_7_32.parquet", "/tmp/a_games_log_visit_base/1_8_= 29.parquet", "/tmp/a_games_log_visit_base/1_8_1.parquet", "/tmp/a_games_log= _visit_base/1_6_20.parquet", "/tmp/a_games_log_visit_base/1_7_17.parquet", = "/tmp/a_games_log_visit_base/1_3_0.parquet", "/tmp/a_games_log_visit_base/1= _1_17.parquet", "/tmp/a_games_log_visit_base/1_4_18.parquet", "/tmp/a_games= _log_visit_base/1_7_9.parquet", "/tmp/a_games_log_visit_base/1_8_32.parquet= ", "/tmp/a_games_log_visit_base/1_7_20.parquet", "/tmp/a_ | > +------+------+ > 1 row selected (0.503 seconds) > {noformat} > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)