Mailing-List: contact issues-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Tue, 30 May 2017 22:31:04 +0000 (UTC)
From: "Gopal V (JIRA)" <jira@apache.org>
To: issues@hive.apache.org
Message-ID: <JIRA.13076004.1496183326000.325591.1496183464293@Atlassian.JIRA>
In-Reply-To: <JIRA.13076004.1496183326000@Atlassian.JIRA>
References: <JIRA.13076004.1496183326000@Atlassian.JIRA> <JIRA.13076004.1496183326777@jira-lw-us.apache.org>
Subject: [jira] [Updated] (HIVE-16793) Scalar sub-query: Scalar safety
 checks for explicit group-bys
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Tue, 30 May 2017 22:31:13 -0000


     [ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gopal V updated HIVE-16793:
---------------------------
    Summary: Scalar sub-query: Scalar safety checks for explicit group-bys  (was: Scalar sub-query: Scalar safety checks for group-bys)

> Scalar sub-query: Scalar safety checks for explicit group-bys
> -------------------------------------------------------------
>
>                 Key: HIVE-16793
>                 URL: https://issues.apache.org/jira/browse/HIVE-16793
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Gopal V
>            Assignee: Vineet Garg
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
>     limit:-1
>     Stage-1
>       Map 1 vectorized, llap
>       File Output Operator [FS_64]
>         Select Operator [SEL_63] (rows=66666666 width=621)
>           Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>           Filter Operator [FIL_62] (rows=66666666 width=625)
>             predicate:(_col5 > _col10)
>             Map Join Operator [MAPJOIN_61] (rows=200000000 width=625)
>               Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
>             <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>               BROADCAST [RS_58]
>                 Select Operator [SEL_57] (rows=1 width=4)
>                   Output:["_col0"]
>                   Group By Operator [GBY_56] (rows=1 width=89)
>                     Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>                   <-Map 5 [SIMPLE_EDGE] vectorized, llap
>                     SHUFFLE [RS_55]
>                       PartitionCols:_col0
>                       Group By Operator [GBY_54] (rows=86 width=89)
>                         Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
>                         Select Operator [SEL_53] (rows=1212121 width=109)
>                           Output:["_col1"]
>                           Filter Operator [FIL_52] (rows=1212121 width=109)
>                             predicate:(p_type = '1')
>                             TableScan [TS_17] (rows=200000000 width=109)
>                               tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
>             <-Map Join Operator [MAPJOIN_60] (rows=200000000 width=621)
>                 Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>               <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
>                 BROADCAST [RS_51]
>                   Select Operator [SEL_50] (rows=1 width=8)
>                     Filter Operator [FIL_49] (rows=1 width=8)
>                       predicate:(sq_count_check(_col0) <= 1)
>                       Group By Operator [GBY_48] (rows=1 width=8)
>                         Output:["_col0"],aggregations:["count(VALUE._col0)"]
>                       <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
>                         PARTITION_ONLY_SHUFFLE [RS_47]
>                           Group By Operator [GBY_46] (rows=1 width=8)
>                             Output:["_col0"],aggregations:["count()"]
>                             Select Operator [SEL_45] (rows=1 width=85)
>                               Group By Operator [GBY_44] (rows=1 width=85)
>                                 Output:["_col0"],keys:KEY._col0
>                               <-Map 2 [SIMPLE_EDGE] vectorized, llap
>                                 SHUFFLE [RS_43]
>                                   PartitionCols:_col0
>                                   Group By Operator [GBY_42] (rows=83 width=85)
>                                     Output:["_col0"],keys:'1'
>                                     Select Operator [SEL_41] (rows=1212121 width=105)
>                                       Filter Operator [FIL_40] (rows=1212121 width=105)
>                                         predicate:(p_type = '1')
>                                         TableScan [TS_2] (rows=200000000 width=105)
>                                           tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
>               <-Select Operator [SEL_59] (rows=200000000 width=621)
>                   Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>                   TableScan [TS_0] (rows=200000000 width=621)
>                     tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
> {code}
> The other version without the filter is missing the check, though the compiler cannot assume the nDV of p_type.
> {code}
> hive> explain  select * from part where p_size > (select max(p_size) from part group by p_type);
> Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 3 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
>     limit:-1
>     Stage-1
>       Map 1 vectorized, llap
>       File Output Operator [FS_26]
>         Select Operator [SEL_25] (rows=11000000000 width=621)
>           Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>           Filter Operator [FIL_24] (rows=11000000000 width=625)
>             predicate:(_col5 > _col9)
>             Map Join Operator [MAPJOIN_23] (rows=33000000000 width=625)
>               Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
>             <-Reducer 3 [BROADCAST_EDGE] vectorized, llap
>               BROADCAST [RS_21]
>                 Select Operator [SEL_20] (rows=165 width=4)
>                   Output:["_col0"]
>                   Group By Operator [GBY_19] (rows=165 width=109)
>                     Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>                   <-Map 2 [SIMPLE_EDGE] vectorized, llap
>                     SHUFFLE [RS_18]
>                       PartitionCols:_col0
>                       Group By Operator [GBY_17] (rows=14190 width=109)
>                         Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type
>                         Select Operator [SEL_16] (rows=200000000 width=109)
>                           Output:["p_type","p_size"]
>                           TableScan [TS_2] (rows=200000000 width=109)
>                             tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
>             <-Select Operator [SEL_22] (rows=200000000 width=621)
>                 Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>                 TableScan [TS_0] (rows=200000000 width=621)
>                   tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
> {code}


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)