Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 40230200C06 for ; Thu, 22 Dec 2016 11:34:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 3F123160B26; Thu, 22 Dec 2016 10:34:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 83D5B160B35 for ; Thu, 22 Dec 2016 11:33:59 +0100 (CET) Received: (qmail 86344 invoked by uid 500); 22 Dec 2016 10:33:58 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 86224 invoked by uid 99); 22 Dec 2016 10:33:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Dec 2016 10:33:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 765202C2A67 for ; Thu, 22 Dec 2016 10:33:58 +0000 (UTC) Date: Thu, 22 Dec 2016 10:33:58 +0000 (UTC) From: "Hive QA (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-15493) Wrong result for LEFT outer join in Tez using MapJoinOperator MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 22 Dec 2016 10:34:00 -0000 [ https://issues.apache.org/jira/browse/HIVE-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769724#comment-15769724 ] Hive QA commented on HIVE-15493: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12844325/HIVE-15493.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10897 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=234) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] (batchId=72) org.apache.hive.hcatalog.api.TestHCatClientNotification.createTable (batchId=220) org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery (batchId=216) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2693/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2693/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2693/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12844325 - PreCommit-HIVE-Build > Wrong result for LEFT outer join in Tez using MapJoinOperator > ------------------------------------------------------------- > > Key: HIVE-15493 > URL: https://issues.apache.org/jira/browse/HIVE-15493 > Project: Hive > Issue Type: Bug > Affects Versions: 2.2.0 > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Priority: Critical > Attachments: HIVE-15493.patch > > > To reproduce, we can run in Tez: > {code:sql} > set hive.auto.convert.join=true; > DROP TABLE IF EXISTS test_1; > CREATE TABLE test_1 > ( > member BIGINT > , age VARCHAR (100) > ) > STORED AS TEXTFILE > ; > DROP TABLE IF EXISTS test_2; > CREATE TABLE test_2 > ( > member BIGINT > ) > STORED AS TEXTFILE > ; > INSERT INTO test_1 VALUES (1, '20'), (2, '30'), (3, '40'); > INSERT INTO test_2 VALUES (1), (2), (3); > SELECT > t2.member > , t1.age_1 > , t1.age_2 > FROM > test_2 t2 > LEFT JOIN ( > SELECT > member > , age as age_1 > , age as age_2 > FROM > test_1 > ) t1 > ON t2.member = t1.member > ; > {code} > Result is: > {noformat} > 1 20 NULL > 3 40 NULL > 2 30 NULL > {noformat} > Correct result is: > {noformat} > 1 20 20 > 3 40 40 > 2 30 30 > {noformat} > Bug was introduced by HIVE-10582. Though the fix in HIVE-10582 does not contain tests, it does look legit. In fact, the problem seems to be in the MapJoinOperator itself. It only happens for LEFT outer join (not with RIGHT outer or FULL outer). Although I am still trying to understand part of the MapJoinOperator code path, the bug could be in the initialization of the operator. It only happens when we have duplicate values in the right part of the output. > Till we have more time to study the problem in detail and fix the MapJoinOperator, I will submit a fix that removes the code in SemanticAnalyzer that reuses duplicated value expressions from RS to create multiple columns in the join output (this is equivalent to reverting HIVE-10582). > Once this is pushed, I will create a follow-up issue to take this code back and tackle the problem in the MapJoinOperator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)