hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Sprague <sprag...@gmail.com>
Subject Re: full outer join result
Date Wed, 12 Mar 2014 16:36:12 GMT
interesting.    don't know the answer but could you change the UNION in the
Postgres to UNION ALL?  I'd be curious if the default is UNION DISTINCT on
that platform. That would at least partially explain postgres behaviour
leaving hive the odd man out.



On Wed, Mar 12, 2014 at 6:47 AM, Martin Kudlej <mkudlej@redhat.com> wrote:

> Hi all,
>
> I've tried BigTop test for join_filters:
> CREATE TABLE myinput1(key int, value int);
> LOAD DATA LOCAL INPATH 'seed_data_files/in3.txt' INTO TABLE myinput1;
>
> where seed_data_files/in3.txt:
> 12      35
> NULL    40
> 48      NULL
> 100     100
>
> I've tried:
> SELECT * FROM myinput1 a FULL OUTER JOIN myinput1 b on a.key > 40 AND
> a.value > 50 AND a.key = a.value AND b.key > 40 AND b.value > 50 AND b.key
> = b.value ORDER BY a.key, a.value, b.key, b.value;
>
> and expected result in test is:
> NULL  NULL  NULL  40
> NULL  NULL  NULL  40
> NULL  NULL  NULL  40
> NULL  NULL  NULL  40
> NULL  NULL  12  35
> NULL  NULL  12  35
> NULL  NULL  12  35
> NULL  NULL  12  35
> NULL  NULL  48  NULL
> NULL  NULL  48  NULL
> NULL  NULL  48  NULL
> NULL  NULL  48  NULL
> NULL  40  NULL  NULL
> NULL  40  NULL  NULL
> NULL  40  NULL  NULL
> NULL  40  NULL  NULL
> 12  35  NULL  NULL
> 12  35  NULL  NULL
> 12  35  NULL  NULL
> 12  35  NULL  NULL
> 48  NULL  NULL  NULL
> 48  NULL  NULL  NULL
> 48  NULL  NULL  NULL
> 48  NULL  NULL  NULL
> 100 100 NULL  NULL
> 100 100 NULL  NULL
> 100 100 NULL  NULL
> 100 100 100 100
>
>
> but real hive result is:
> NULL    NULL    NULL    40
> NULL    NULL    12      35
> NULL    NULL    48      NULL
> NULL    40      NULL    NULL
> 12      35      NULL    NULL
> 48      NULL    NULL    NULL
> 100     100     100     100
>
> btw. result from postgresql is:
> (SELECT *
>   FROM myinput1 a
> LEFT JOIN
>   myinput1 b on
> a.key > 40 AND
>     a.value > 50 AND
>     a.key = a.value AND
>     b.key > 40 AND
>     b.value > 50 AND
>     b.key = b.value  ORDER BY a.key, a.value, b.key, b.value)
> UNION (SELECT *
>   FROM myinput1 a
> RIGHT JOIN
>   myinput1 b on
> a.key > 40 AND
>     a.value > 50 AND
>     a.key = a.value AND
>     b.key > 40 AND
>     b.value > 50 AND
>     b.key = b.value
> ORDER BY a.key, a.value, b.key, b.value);
>      |       |  12 |    35
>   12 |    35 |     |
>      |       |  48 |
>   48 |       |     |
>      |    40 |     |
>      |       |     |    40
>  100 |   100 | 100 |   100
>
> so it's the same like in hive.
>
> What is the right result for this full outer join in HiveQL, please?
>
> --
> Best Regards,
> Martin Kudlej.
> MRG/Grid & RHS-Hadoop Senior Quality Assurance Engineer
> Red Hat Czech s.r.o.
>
> Phone: +420 532 294 155
> E-mail:mkudlej at redhat.com
> IRC:   mkudlej at #brno, #messaging, #grid, #rhs, #distcomp
>

Mime
View raw message