hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-870) semi joins
Date Mon, 09 Nov 2009 05:28:32 GMT

    [ https://issues.apache.org/jira/browse/HIVE-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774863#action_12774863

Namit Jain commented on HIVE-870:

Can you add more tests with STREAMTABLE also ?

Do you want to separate out the comment changes and file a new jira for that ?
That is blowing up the number of files, and making it difficult to review. If you
think that will help, please file a new jira and submit a patch for that - I will try to 
take a look at that asap.

> semi joins
> ----------
>                 Key: HIVE-870
>                 URL: https://issues.apache.org/jira/browse/HIVE-870
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: Hive-870.patch, Hive-870_2.patch
> Semi-join is an efficient way to unnest an IN/EXISTS subquery. For example,
> select * 
> from A
> where A.id IN 
>    (select id
>     from B
>     where B.date> '2009-10-01');
> returns from A whose ID is in the set of IDs found in B, whose date is greater than a
certain date. This query can be unnested using a INNER join or LEFT OUTER JOIN, but we need
to deduplicate the IDs returned by the subquery on table B. The semantics of LEFT SEMI JOIN
is that as long as there is ANY row in the right-hand table that matches the join key, the
left-hand table row will be emitted as a result w/o necessarily looking further in the right-hand
table for further matches. This is exactly the semantics of the IN subquery. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message