Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1F707200CA7 for ; Wed, 31 May 2017 00:31:13 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1DC35160BDC; Tue, 30 May 2017 22:31:13 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3D57E160BC9 for ; Wed, 31 May 2017 00:31:12 +0200 (CEST) Received: (qmail 60929 invoked by uid 500); 30 May 2017 22:31:11 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 60920 invoked by uid 99); 30 May 2017 22:31:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 May 2017 22:31:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9D8D7C061B for ; Tue, 30 May 2017 22:31:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.201 X-Spam-Level: X-Spam-Status: No, score=-99.201 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id Fmh1ty6ZufAt for ; Tue, 30 May 2017 22:31:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 83CCA5FDAD for ; Tue, 30 May 2017 22:31:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id E85EFE0D77 for ; Tue, 30 May 2017 22:31:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 483C921B5E for ; Tue, 30 May 2017 22:31:04 +0000 (UTC) Date: Tue, 30 May 2017 22:31:04 +0000 (UTC) From: "Gopal V (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-16793) Scalar sub-query: Scalar safety checks for explicit group-bys MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 30 May 2017 22:31:13 -0000 [ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-16793: --------------------------- Summary: Scalar sub-query: Scalar safety checks for explicit group-bys (was: Scalar sub-query: Scalar safety checks for group-bys) > Scalar sub-query: Scalar safety checks for explicit group-bys > ------------------------------------------------------------- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Reporter: Gopal V > Assignee: Vineet Garg > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows=66666666 width=621) > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows=66666666 width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=200000000 width=625) > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=200000000 width=109) > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=200000000 width=621) > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 width=105) > Filter Operator [FIL_40] (rows=1212121 width=105) > predicate:(p_type = '1') > TableScan [TS_2] (rows=200000000 width=105) > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"] > <-Select Operator [SEL_59] (rows=200000000 width=621) > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > TableScan [TS_0] (rows=200000000 width=621) > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"] > {code} > The other version without the filter is missing the check, though the compiler cannot assume the nDV of p_type. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part group by p_type); > Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 3 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_26] > Select Operator [SEL_25] (rows=11000000000 width=621) > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_24] (rows=11000000000 width=625) > predicate:(_col5 > _col9) > Map Join Operator [MAPJOIN_23] (rows=33000000000 width=625) > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"] > <-Reducer 3 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_21] > Select Operator [SEL_20] (rows=165 width=4) > Output:["_col0"] > Group By Operator [GBY_19] (rows=165 width=109) > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_18] > PartitionCols:_col0 > Group By Operator [GBY_17] (rows=14190 width=109) > Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type > Select Operator [SEL_16] (rows=200000000 width=109) > Output:["p_type","p_size"] > TableScan [TS_2] (rows=200000000 width=109) > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Select Operator [SEL_22] (rows=200000000 width=621) > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > TableScan [TS_0] (rows=200000000 width=621) > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"] > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)