Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 92E30105EB for ; Fri, 2 Aug 2013 14:17:58 +0000 (UTC) Received: (qmail 84723 invoked by uid 500); 2 Aug 2013 14:17:56 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 84541 invoked by uid 500); 2 Aug 2013 14:17:56 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 84483 invoked by uid 500); 2 Aug 2013 14:17:50 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 84477 invoked by uid 99); 2 Aug 2013 14:17:50 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Aug 2013 14:17:50 +0000 Date: Fri, 2 Aug 2013 14:17:50 +0000 (UTC) From: "Hudson (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13727696#comment-13727696 ] Hudson commented on HIVE-4952: ------------------------------ SUCCESS: Integrated in Hive-trunk-h0.21 #2239 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2239/]) HIVE-4952 : When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results (Yin Huai via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1509542) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java * /hive/trunk/ql/src/test/queries/clientpositive/correlationoptimizer15.q * /hive/trunk/ql/src/test/results/clientpositive/correlationoptimizer15.q.out > When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results > ------------------------------------------------------------------------------------------------------------ > > Key: HIVE-4952 > URL: https://issues.apache.org/jira/browse/HIVE-4952 > Project: Hive > Issue Type: Bug > Affects Versions: 0.12.0 > Reporter: Yin Huai > Assignee: Yin Huai > Fix For: 0.12.0 > > Attachments: HIVE-4952.D11889.1.patch, HIVE-4952.D11889.2.patch, replay.txt > > > If we have a query like this ... > {code:sql} > SELECT xx.key, xx.cnt, yy.key > FROM > (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = y.key) group by x.key) xx > JOIN src yy > ON xx.key=yy.key; > {\code} > After Correlation Optimizer, the operator tree in the reducer will be > {code} > JOIN2 > | > | > MUX > / \ > / \ > GBY | > | | > JOIN1 | > \ / > \ / > DEMUX > {\code} > For JOIN2, the right table will arrive at this operator first. If hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even it has not got any row from the left table. The logic related hive.join.emit.interval in JoinOperator assumes that inputs will be ordered by the tag. But, if a query has been optimized by Correlation Optimizer, this assumption may not hold for those JoinOperators inside the reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira