hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Sprague <sprag...@gmail.com>
Subject Re: full outer join result
Date Wed, 12 Mar 2014 20:38:01 GMT
well. i had some free time to search it.  from here:
http://www.postgresql.org/docs/9.3/static/sql-select.html#SQL-UNION you'll
see the default is indeed UNION DISTINCT.  so changing it to UNION ALL
you'll get different results - are they the ones you're expecting?


On Wed, Mar 12, 2014 at 9:36 AM, Stephen Sprague <spragues@gmail.com> wrote:

> interesting.    don't know the answer but could you change the UNION in
> the Postgres to UNION ALL?  I'd be curious if the default is UNION DISTINCT
> on that platform. That would at least partially explain postgres behaviour
> leaving hive the odd man out.
>
>
>
> On Wed, Mar 12, 2014 at 6:47 AM, Martin Kudlej <mkudlej@redhat.com> wrote:
>
>> Hi all,
>>
>> I've tried BigTop test for join_filters:
>> CREATE TABLE myinput1(key int, value int);
>> LOAD DATA LOCAL INPATH 'seed_data_files/in3.txt' INTO TABLE myinput1;
>>
>> where seed_data_files/in3.txt:
>> 12      35
>> NULL    40
>> 48      NULL
>> 100     100
>>
>> I've tried:
>> SELECT * FROM myinput1 a FULL OUTER JOIN myinput1 b on a.key > 40 AND
>> a.value > 50 AND a.key = a.value AND b.key > 40 AND b.value > 50 AND b.key
>> = b.value ORDER BY a.key, a.value, b.key, b.value;
>>
>> and expected result in test is:
>> NULL  NULL  NULL  40
>> NULL  NULL  NULL  40
>> NULL  NULL  NULL  40
>> NULL  NULL  NULL  40
>> NULL  NULL  12  35
>> NULL  NULL  12  35
>> NULL  NULL  12  35
>> NULL  NULL  12  35
>> NULL  NULL  48  NULL
>> NULL  NULL  48  NULL
>> NULL  NULL  48  NULL
>> NULL  NULL  48  NULL
>> NULL  40  NULL  NULL
>> NULL  40  NULL  NULL
>> NULL  40  NULL  NULL
>> NULL  40  NULL  NULL
>> 12  35  NULL  NULL
>> 12  35  NULL  NULL
>> 12  35  NULL  NULL
>> 12  35  NULL  NULL
>> 48  NULL  NULL  NULL
>> 48  NULL  NULL  NULL
>> 48  NULL  NULL  NULL
>> 48  NULL  NULL  NULL
>> 100 100 NULL  NULL
>> 100 100 NULL  NULL
>> 100 100 NULL  NULL
>> 100 100 100 100
>>
>>
>> but real hive result is:
>> NULL    NULL    NULL    40
>> NULL    NULL    12      35
>> NULL    NULL    48      NULL
>> NULL    40      NULL    NULL
>> 12      35      NULL    NULL
>> 48      NULL    NULL    NULL
>> 100     100     100     100
>>
>> btw. result from postgresql is:
>> (SELECT *
>>   FROM myinput1 a
>> LEFT JOIN
>>   myinput1 b on
>> a.key > 40 AND
>>     a.value > 50 AND
>>     a.key = a.value AND
>>     b.key > 40 AND
>>     b.value > 50 AND
>>     b.key = b.value  ORDER BY a.key, a.value, b.key, b.value)
>> UNION (SELECT *
>>   FROM myinput1 a
>> RIGHT JOIN
>>   myinput1 b on
>> a.key > 40 AND
>>     a.value > 50 AND
>>     a.key = a.value AND
>>     b.key > 40 AND
>>     b.value > 50 AND
>>     b.key = b.value
>> ORDER BY a.key, a.value, b.key, b.value);
>>      |       |  12 |    35
>>   12 |    35 |     |
>>      |       |  48 |
>>   48 |       |     |
>>      |    40 |     |
>>      |       |     |    40
>>  100 |   100 | 100 |   100
>>
>> so it's the same like in hive.
>>
>> What is the right result for this full outer join in HiveQL, please?
>>
>> --
>> Best Regards,
>> Martin Kudlej.
>> MRG/Grid & RHS-Hadoop Senior Quality Assurance Engineer
>> Red Hat Czech s.r.o.
>>
>> Phone: +420 532 294 155
>> E-mail:mkudlej at redhat.com
>> IRC:   mkudlej at #brno, #messaging, #grid, #rhs, #distcomp
>>
>
>

Mime
View raw message