hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vlad Gudikov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition
Date Fri, 21 Jul 2017 12:10:00 GMT
Vlad Gudikov created HIVE-17148:
-----------------------------------

             Summary: Incorrect result for Hive join query with COALESCE in WHERE condition
                 Key: HIVE-17148
                 URL: https://issues.apache.org/jira/browse/HIVE-17148
             Project: Hive
          Issue Type: Bug
          Components: CBO
    Affects Versions: 2.1.1
         Environment: {color:red}colored text{color}
            Reporter: Vlad Gudikov


The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:

STEPS TO REPRODUCE:

{code}
Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query 
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
{code}

ACTUAL RESULT:
{code}
The query returns nothing;
{code}

EXPECTED RESULT:
{code}
1       NULL    1
{code}

The issue seems to be because of the incorrect query plan. In the plan we can see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows is any column mentioned
in the COALESCE has null value.
Please find the query plan below:

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Map 1
      File Output Operator [FS_10]
        Map Join Operator [MAPJOIN_15] (rows=1 width=4)
          Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
        <-Map 2 [BROADCAST_EDGE]
          BROADCAST [RS_7]
            PartitionCols:_col0
            Select Operator [SEL_5] (rows=1 width=1)
              Output:["_col0"]
              Filter Operator [FIL_14] (rows=1 width=1)
                predicate:a2 is not null
                TableScan [TS_3] (rows=1 width=1)
                  default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
        <-Select Operator [SEL_2] (rows=1 width=4)
            Output:["_col0","_col1"]
            Filter Operator [FIL_13] (rows=1 width=4)
              predicate:{color:red}(a1 is not null and b1 is not null){color}
              TableScan [TS_0] (rows=1 width=4)
                default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
{code}

This happens only if join is inner type? otherwise HiveJoinAddNotRule which creates whis problem
is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message