hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <hashut...@apache.org>
Subject Re: Wrong results from join query in Hive 0.13 and also 1.0 with reproduce.
Date Wed, 02 Sep 2015 09:24:04 GMT
https://issues.apache.org/jira/browse/HIVE-10841

Thanks,
Ashutosh

On Tue, Sep 1, 2015 at 6:00 PM, Jim Green <openkbinfo@gmail.com> wrote:

> Seems Hive 1.2 fixed this issue. But not sure what is the JIRA related and
> the possibility to backport this fix into Hive 0.13?
>
>
> On Tue, Sep 1, 2015 at 5:35 PM, Jim Green <openkbinfo@gmail.com> wrote:
>
>> Hi Team,
>>
>> Below is the minimum reproduce of wrong results in Hive 0.13:
>>
>> *1. Create 4 tables*
>> CREATE EXTERNAL TABLE testjoin1( joincol string );
>> CREATE EXTERNAL TABLE testjoin2(
>>    anothercol string ,
>>    joincol string);
>>
>> CREATE EXTERNAL TABLE testjoin3( anothercol string);
>>
>> CREATE EXTERNAL TABLE testjoin4(
>>   joincol string,
>>   wherecol string ,
>>   wherecol2 string);
>>
>> *2. Insert sample data *
>> (Note: Make sure you firstly create the dual table which only contains 1
>> row)
>>
>> insert into table testjoin1 select '1' from dual;
>> insert into table testjoin2 select 'another','1' from dual;
>> insert into table testjoin3 select 'another' from dual;
>> insert into table testjoin4 select '1','I_AM_MISSING','201501' from dual;
>> insert into table testjoin4 select
>> '1','I_Shouldnot_be_in_output','201501' from
>> dual;
>>
>> hive> select * from testjoin1;
>> OK
>> 1
>> Time taken: 0.04 seconds, Fetched: 1 row(s)
>>
>> hive> select * from testjoin2;
>> OK
>> another    1
>> Time taken: 0.039 seconds, Fetched: 1 row(s)
>>
>> hive> select * from testjoin3;
>> OK
>> another
>> Time taken: 0.038 seconds, Fetched: 1 row(s)
>>
>> hive> select * from testjoin4;
>> OK
>> 1    I_AM_MISSING    201501
>> 1    I_Shouldnot_be_in_output    201501
>> Time taken: 0.04 seconds, Fetched: 2 row(s)
>>
>> *3. SQL1 is returning wrong results.*
>>
>> Select testjoin4.* From
>> testjoin1
>> JOIN testjoin2
>>   ON (testjoin2.joincol = testjoin1.joincol)
>> JOIN testjoin3
>>   ON (testjoin3.anothercol= testjoin2.anothercol)
>> JOIN testjoin4
>>   ON (testjoin4.joincol = testjoin1.joincol AND
>> testjoin4.wherecol2='201501')
>> WHERE (testjoin4.wherecol='I_AM_MISSING');
>>
>> 1    I_AM_MISSING    201501
>> 1    I_Shouldnot_be_in_output    201501
>> Time taken: 21.702 seconds, Fetched: 2 row(s)
>>
>>
>> *4. SQL2 is returning good result(If we move the both filters to WHERE
>> clause )*
>>
>> Select testjoin4.* From
>> testjoin1
>> JOIN testjoin2
>>   ON (testjoin2.joincol = testjoin1.joincol)
>> JOIN testjoin3
>>   ON (testjoin3.anothercol= testjoin2.anothercol)
>> JOIN testjoin4
>>   ON (testjoin4.joincol = testjoin1.joincol)
>> WHERE (testjoin4.wherecol='I_AM_MISSING' and
>> testjoin4.wherecol2='201501');
>>
>> 1    I_AM_MISSING    201501
>> Time taken: 20.393 seconds, Fetched: 1 row(s)
>> —————
>> *Another test is done in Hive 1.0 and found both SQL1 and SQL2 are
>> returning wrong results….*
>>
>> 1 I_AM_MISSING 201501
>> 1 I_AM_MISSING 201501
>> Time taken: 13.983 seconds, Fetched: 2 row(s)
>>
>> *Anybody knows any related JIRAs?*
>>
>> --
>> Thanks,
>> www.openkb.info
>> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>>
>
>
>
> --
> Thanks,
> www.openkb.info
> (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
>

Mime
View raw message