hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Sun (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16698) HoS should avoid mapjoin optimization in case of union and using table stats
Date Wed, 17 May 2017 17:17:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chao Sun updated HIVE-16698:
----------------------------
    Status: Patch Available  (was: Open)

> HoS should avoid mapjoin optimization in case of union and using table stats
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-16698
>                 URL: https://issues.apache.org/jira/browse/HIVE-16698
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer, Spark
>    Affects Versions: 3.0.0
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>         Attachments: HIVE-16698.1.patch
>
>
> When {{hive.spark.use.ts.stats.for.mapjoin}} is true, HoS would not check whether the
big table branch has upstream UNION operators. This is wrong and could generate incorrect
plan. To reproduce:
> {code}
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=16;
> set hive.spark.use.ts.stats.for.mapjoin=true;
> create table a (c1 string, c2 int);
> create table b (c3 string, c4 int);
> create table c (c1 string, c2 int);
> create table d (c3 string, c4 int);
> create table e (c5 string, c6 int);
> insert into table a values
> ("a1", 1), ("a2", 2), ("a3", 3), ("a4", 4), ("a5", 5), ("a6", 6), ("a7", 7);
> insert into table b values
> ("b1", 1), ("b2", 2), ("b3", 3), ("b4", 4);
> insert into table c values
> ("c1", 1), ("c2", 2), ("c3", 3), ("c4", 4), ("c5", 5), ("c6", 6), ("c7", 7);
> insert into table d values
> ("d1", 1), ("d2", 2), ("d3", 3), ("d4", 4);
> insert into table e values
> ("d1", 1), ("d2", 2);
> explain
> with t1 as (
> select a.c1 as c1, a.c2 as c2, b.c3 as c3 from a join b on a.c2 = b.c4
> ),
> t2 as (
> select c.c1 as c1, c.c2 as c2, d.c3 as c3 from c join d on c.c2 = d.c4
> ),
> t3 as (
> select * from t1 union all select * from t2
> ),
> t4 as (
> select t3.c1, t3.c3, t5.c5 from t3 join e as t5 on t3.c2 = t5.c6
> )
> select * from t4;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message