hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <mdefoinplatel....@orange.com>
Subject LEFT OUTER JOIN and partition
Date Thu, 22 Dec 2011 13:28:54 GMT
Hi folks,
Ok, I give you the context first :

I have seen in the mailing list people advising the use of a ... LEFT OUTER JOIN ON ... followed
by a WHERE close to write a NOT IN type of query.

So to make it clear, it should be possible to rewrite the following query:
SELECT a.id FROM a WHERE a.id NOT IN (SELECT b.id FROM b);

Into the following one:
SELECT a.id FROM a LEFT OUTER JOIN b ON (a.id=b.id) WHERE b.id IS NULL;

Ok fine, but what is going on in case of partitioned tables ?

Suppose a and b are partitioned on dt and you want to write the query only for a given partition:
SELECT a.id FROM a LEFT OUTER JOIN b ON (a.id=b.id AND a.dt=X AND b.dt=X) WHERE b.id IS NULL;

As far as I am concerned, the restriction a.dt=X does not produce the expected results and
I will get rows with a.dt different from X ...
So of course I can restrict the output to the partitions I am interested in with a WHERE clause
but this looks very odd to me.

So my questions are :

1)      Am I the only one experience this behavior ?

2)      Since the JOIN operation is performed before the WHERE one then does it really means
that the join will take place on all the partitions ( in which case this type of query will
not be tractable in practice) ?

3)      Is there any other way to write a NOT IN query in Hive and more generally how to perform
a set difference  operation in Hive ?

Sorry for the long email...

Cheers,
Michael



_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees
et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par
erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant
susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci

This message and its attachments may contain confidential or privileged information that may
be protected by law;
they should not be distributed, used or copied without authorization.
If you have received this email in error, please notify the sender and delete this message
and its attachments.
As emails may be altered, France Telecom - Orange shall not be liable if this message was
modified, changed or falsified.
Thank you.


Mime
View raw message