hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-262) outer join gets some duplicate rows in some scenarios
Date Sat, 31 Jan 2009 23:35:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669288#action_12669288

Namit Jain commented on HIVE-262:

However, the above does not work for the following:

A left outer join B on A.c1=B.c1 right outer join C on B.c1=C.c1

Consider the following rows for a given value of c1:

A --> a1 a2
B -> null
C -> c1 c2

Since there is no join, no pruning will happen, and the following output will be produced

null null c1
null null c1
null null c2
null null c2

whereas the correct output is:

null null c1
null null c2

Note that 2 extra rows will be produced.

So, I think the patch's approach should be better

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>         Attachments: patch.262.1.txt, patch262.2.txt
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT
OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message