hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vlad Gudikov <>
Subject Incorrect result for Hive join query with COALESCE in WHERE condition
Date Mon, 24 Jul 2017 08:51:12 GMT
The issue seems to be because of the incorrect query plan. In the plan we
can see:
predicate:(a1 is not null and b1 is not null, see queries below). This
predicate is created with HiveJoinAddNotNull, which is adding predicate
that is checking that all fields that take part in filter operator are not
null. I'm curious what is the purpose of this rule, after creating such
predicate we can't use functions that can process NULLs and give valid
result as filters. See steps to reproduce. I would really appreciate any
help in order to understand this case.


Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;


The query returns nothing;


1       NULL    1

The issue seems to be because of the incorrect query plan. In the plan we
can see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows
is any column mentioned in the COALESCE has null value.
Please find the query plan below:

Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

  Fetch Operator
      Map 1
      File Output Operator [FS_10]
        Map Join Operator [MAPJOIN_15] (rows=1 width=4)
        <-Map 2 [BROADCAST_EDGE]
          BROADCAST [RS_7]
            Select Operator [SEL_5] (rows=1 width=1)
              Filter Operator [FIL_14] (rows=1 width=1)
                predicate:a2 is not null
                TableScan [TS_3] (rows=1 width=1)
        <-Select Operator [SEL_2] (rows=1 width=4)
            Filter Operator [FIL_13] (rows=1 width=4)
              predicate:{color:red}(a1 is not null and b1 is not null){color}
              TableScan [TS_0] (rows=1 width=4)

This happens only if join is inner type, otherwise HiveJoinAddNotRule which
creates this problem is skipped.

View raw message