Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2017410998 for ; Wed, 30 Oct 2013 17:25:02 +0000 (UTC) Received: (qmail 47765 invoked by uid 500); 30 Oct 2013 17:23:44 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 47654 invoked by uid 500); 30 Oct 2013 17:23:41 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 47539 invoked by uid 500); 30 Oct 2013 17:23:37 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 47440 invoked by uid 99); 30 Oct 2013 17:23:26 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Oct 2013 17:23:26 +0000 Date: Wed, 30 Oct 2013 17:23:26 +0000 (UTC) From: "Yin Huai (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-5697) Correlation Optimizer may generate wrong plans for cases involving outer join MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-5697: --------------------------- Issue Type: Sub-task (was: Bug) Parent: HIVE-3667 > Correlation Optimizer may generate wrong plans for cases involving outer join > ----------------------------------------------------------------------------- > > Key: HIVE-5697 > URL: https://issues.apache.org/jira/browse/HIVE-5697 > Project: Hive > Issue Type: Sub-task > Affects Versions: 0.12.0, 0.13.0 > Reporter: Yin Huai > Assignee: Yin Huai > > For example, > {code:sql} > select x.key, y.value, count(*) from src x right outer join src1 y on (x.key=y.key and x.value=y.value) group by x.key, y.value; > {code} > Correlation optimizer will determine that a single MR job is enough for this query. However, the group by key are from both left and right tables of the right outer join. > We will have a wrong result like > {code} > NULL 4 > NULL val_165 1 > NULL val_193 1 > NULL val_265 1 > NULL val_27 1 > NULL val_409 1 > NULL val_484 1 > NULL 1 > 146 val_146 2 > 150 val_150 1 > 213 val_213 2 > NULL 1 > 238 val_238 2 > 255 val_255 2 > 273 val_273 3 > 278 val_278 2 > 311 val_311 3 > NULL 1 > 401 val_401 5 > 406 val_406 4 > 66 val_66 1 > 98 val_98 2 > {code} > Rows with both x.key and y.value are null may not be grouped. -- This message was sent by Atlassian JIRA (v6.1#6144)