hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5697) Correlation Optimizer may generate wrong plans for cases involving outer join
Date Sun, 10 Nov 2013 12:13:17 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818430#comment-13818430
] 

Hive QA commented on HIVE-5697:
-------------------------------



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12613032/HIVE-5697.2.patch

{color:green}SUCCESS:{color} +1 4600 tests passed

Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/236/testReport
Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/236/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12613032

> Correlation Optimizer may generate wrong plans for cases involving outer join
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-5697
>                 URL: https://issues.apache.org/jira/browse/HIVE-5697
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 0.12.0, 0.13.0
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>         Attachments: HIVE-5697.1.patch, HIVE-5697.2.patch
>
>
> For example,
> {code:sql}
> select x.key, y.value, count(*) from src x right outer join src1 y on (x.key=y.key and
x.value=y.value) group by x.key, y.value; 
> {code}
> Correlation optimizer will determine that a single MR job is enough for this query. However,
the group by key are from both left and right tables of the right outer join. 
> We will have a wrong result like
> {code}
> NULL		4
> NULL	val_165	1
> NULL	val_193	1
> NULL	val_265	1
> NULL	val_27	1
> NULL	val_409	1
> NULL	val_484	1
> NULL		1
> 146	val_146	2
> 150	val_150	1
> 213	val_213	2
> NULL		1
> 238	val_238	2
> 255	val_255	2
> 273	val_273	3
> 278	val_278	2
> 311	val_311	3
> NULL		1
> 401	val_401	5
> 406	val_406	4
> 66	val_66	1
> 98	val_98	2
> {code}
> Rows with both x.key and y.value are null may not be grouped.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message