hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navis류승우 <navis....@nexr.com>
Subject Re: full outer join result
Date Thu, 13 Mar 2014 01:03:26 GMT
The 100 100 is the sole matching row for the join condition, so it
would be right result,

NULL    NULL    NULL    40
NULL    NULL    12      35
NULL    NULL    48      NULL
NULL    40      NULL    NULL
12      35      NULL    NULL
48      NULL    NULL    NULL
100     100     100     100

It's fixed by HIVE-3411 and HIVE-3381

2014-03-13 5:38 GMT+09:00 Stephen Sprague <spragues@gmail.com>:
> well. i had some free time to search it.  from here:
> http://www.postgresql.org/docs/9.3/static/sql-select.html#SQL-UNION you'll
> see the default is indeed UNION DISTINCT.  so changing it to UNION ALL
> you'll get different results - are they the ones you're expecting?
>
>
> On Wed, Mar 12, 2014 at 9:36 AM, Stephen Sprague <spragues@gmail.com> wrote:
>>
>> interesting.    don't know the answer but could you change the UNION in
>> the Postgres to UNION ALL?  I'd be curious if the default is UNION DISTINCT
>> on that platform. That would at least partially explain postgres behaviour
>> leaving hive the odd man out.
>>
>>
>>
>> On Wed, Mar 12, 2014 at 6:47 AM, Martin Kudlej <mkudlej@redhat.com> wrote:
>>>
>>> Hi all,
>>>
>>> I've tried BigTop test for join_filters:
>>> CREATE TABLE myinput1(key int, value int);
>>> LOAD DATA LOCAL INPATH 'seed_data_files/in3.txt' INTO TABLE myinput1;
>>>
>>> where seed_data_files/in3.txt:
>>> 12      35
>>> NULL    40
>>> 48      NULL
>>> 100     100
>>>
>>> I've tried:
>>> SELECT * FROM myinput1 a FULL OUTER JOIN myinput1 b on a.key > 40 AND
>>> a.value > 50 AND a.key = a.value AND b.key > 40 AND b.value > 50 AND
b.key =
>>> b.value ORDER BY a.key, a.value, b.key, b.value;
>>>
>>> and expected result in test is:
>>> NULL  NULL  NULL  40
>>> NULL  NULL  NULL  40
>>> NULL  NULL  NULL  40
>>> NULL  NULL  NULL  40
>>> NULL  NULL  12  35
>>> NULL  NULL  12  35
>>> NULL  NULL  12  35
>>> NULL  NULL  12  35
>>> NULL  NULL  48  NULL
>>> NULL  NULL  48  NULL
>>> NULL  NULL  48  NULL
>>> NULL  NULL  48  NULL
>>> NULL  40  NULL  NULL
>>> NULL  40  NULL  NULL
>>> NULL  40  NULL  NULL
>>> NULL  40  NULL  NULL
>>> 12  35  NULL  NULL
>>> 12  35  NULL  NULL
>>> 12  35  NULL  NULL
>>> 12  35  NULL  NULL
>>> 48  NULL  NULL  NULL
>>> 48  NULL  NULL  NULL
>>> 48  NULL  NULL  NULL
>>> 48  NULL  NULL  NULL
>>> 100 100 NULL  NULL
>>> 100 100 NULL  NULL
>>> 100 100 NULL  NULL
>>> 100 100 100 100
>>>
>>>
>>> but real hive result is:
>>> NULL    NULL    NULL    40
>>> NULL    NULL    12      35
>>> NULL    NULL    48      NULL
>>> NULL    40      NULL    NULL
>>> 12      35      NULL    NULL
>>> 48      NULL    NULL    NULL
>>> 100     100     100     100
>>>
>>> btw. result from postgresql is:
>>> (SELECT *
>>>   FROM myinput1 a
>>> LEFT JOIN
>>>   myinput1 b on
>>> a.key > 40 AND
>>>     a.value > 50 AND
>>>     a.key = a.value AND
>>>     b.key > 40 AND
>>>     b.value > 50 AND
>>>     b.key = b.value  ORDER BY a.key, a.value, b.key, b.value)
>>> UNION (SELECT *
>>>   FROM myinput1 a
>>> RIGHT JOIN
>>>   myinput1 b on
>>> a.key > 40 AND
>>>     a.value > 50 AND
>>>     a.key = a.value AND
>>>     b.key > 40 AND
>>>     b.value > 50 AND
>>>     b.key = b.value
>>> ORDER BY a.key, a.value, b.key, b.value);
>>>      |       |  12 |    35
>>>   12 |    35 |     |
>>>      |       |  48 |
>>>   48 |       |     |
>>>      |    40 |     |
>>>      |       |     |    40
>>>  100 |   100 | 100 |   100
>>>
>>> so it's the same like in hive.
>>>
>>> What is the right result for this full outer join in HiveQL, please?
>>>
>>> --
>>> Best Regards,
>>> Martin Kudlej.
>>> MRG/Grid & RHS-Hadoop Senior Quality Assurance Engineer
>>> Red Hat Czech s.r.o.
>>>
>>> Phone: +420 532 294 155
>>> E-mail:mkudlej at redhat.com
>>> IRC:   mkudlej at #brno, #messaging, #grid, #rhs, #distcomp
>>
>>
>

Mime
View raw message