hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Luo <j...@merkleinc.com>
Subject multiple selects on a left join give incorrect result
Date Tue, 03 May 2016 22:58:01 GMT
All,

I have found that when doing a multiple selects on a left join, the “on” clause seems
to be ignored!!! (It is hard to believe).

Below is a very simple test case and please tell me I am crazy. I am on hdp 2.3.4.7.


CREATE TABLE T_A ( id    STRING, val   STRING );
CREATE TABLE T_B ( id    STRING, val   STRING );
CREATE TABLE join_result_1 ( ida    STRING, vala   STRING, idb    STRING, valb   STRING );
CREATE TABLE join_result_2 ( ida    STRING, vala   STRING, idb    STRING, valb   STRING );
CREATE TABLE join_result_3 ( ida    STRING, vala   STRING, idb    STRING, valb   STRING );

INSERT INTO TABLE T_A
VALUES ('Id_1', 'val_101'), ('Id_2', 'val_102'), ('Id_3', 'val_103');

INSERT INTO TABLE T_B
VALUES ('Id_1', 'val_103'), ('Id_2', 'val_104');

FROM T_A a LEFT JOIN T_B b ON a.id = b.id
INSERT OVERWRITE TABLE join_result_1
   SELECT a.*, b.*
    WHERE b.id = 'Id_1' AND b.val = 'val_103'
INSERT OVERWRITE TABLE join_result_2
   SELECT a.*, b.*
    WHERE b.val IS NULL OR (b.id = 'Id_3' AND b.val = 'val_101')
INSERT OVERWRITE TABLE join_result_3
   SELECT a.*, b.*
    WHERE b.val = 'val_104' AND b.id = 'Id_2' AND a.val <> b.val;


And here is the result:

0: jdbc:hive2 > select * from join_result_1;
+--------------------+---------------------+--------------------+---------------------+--+
| join_result_1.ida  | join_result_1.vala  | join_result_1.idb  | join_result_1.valb  |
+--------------------+---------------------+--------------------+---------------------+--+
| Id_1               | val_101             | Id_1               | val_103             |
| Id_2               | val_102             | Id_1               | val_103             |
| Id_3               | val_103             | Id_1               | val_103             |
+--------------------+---------------------+--------------------+---------------------+--+
3 rows selected (0.057 seconds)




I am expecting join_result_1 to have one row, but got three!!!

Has other people run into the same thing?

Join us at Merkle’s 2016 annual Performance Marketer Executive Summit – June 7 – 9 in
Memphis, TN
<http://www2.merkleinc.com/l/47252/2016-04-26/3lbfdc>

Download the latest installment of our annual Marketing Imperatives, “Winning with People-Based
Marketing”<http://www2.merkleinc.com/l/47252/2016-04-26/3lbfd1>

This email and any attachments transmitted with it are intended for use by the intended recipient(s)
only. If you have received this email in error, please notify the sender immediately and then
delete it. If you are not the intended recipient, you must not keep, use, disclose, copy or
distribute this email without the author’s prior permission. We take precautions to minimize
the risk of transmitting software viruses, but we advise you to perform your own virus checks
on any attachment to this message. We cannot accept liability for any loss or damage caused
by software viruses. The information contained in this communication may be confidential and
may be subject to the attorney-client privilege.

Mime
View raw message