Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 412DE200B0F for ; Thu, 2 Jun 2016 12:18:13 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 40509160A3F; Thu, 2 Jun 2016 10:18:13 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 04C2E160A90 for ; Thu, 2 Jun 2016 12:18:06 +0200 (CEST) Received: (qmail 77400 invoked by uid 500); 2 Jun 2016 10:18:06 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 77282 invoked by uid 99); 2 Jun 2016 10:18:06 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jun 2016 10:18:06 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 2A2AE2C1F5D for ; Thu, 2 Jun 2016 10:18:06 +0000 (UTC) Date: Thu, 2 Jun 2016 10:18:06 +0000 (UTC) From: "Jesus Camacho Rodriguez (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-12762) Common join on parquet tables returns incorrect result when hive.optimize.index.filter set to true MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 02 Jun 2016 10:18:13 -0000 [ https://issues.apache.org/jira/browse/HIVE-12762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-12762: ------------------------------------------- Fix Version/s: (was: 2.1.0) > Common join on parquet tables returns incorrect result when hive.optimize.index.filter set to true > -------------------------------------------------------------------------------------------------- > > Key: HIVE-12762 > URL: https://issues.apache.org/jira/browse/HIVE-12762 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer > Affects Versions: 2.0.0, 2.1.0 > Reporter: Aihua Xu > Assignee: Aihua Xu > Fix For: 2.0.0 > > Attachments: HIVE-12762.2.patch, HIVE-12762.patch > > > The following query will give incorrect result. > {noformat} > CREATE TABLE tbl1(id INT) STORED AS PARQUET; > INSERT INTO tbl1 VALUES(1), (2); > CREATE TABLE tbl2(id INT, value STRING) STORED AS PARQUET; > INSERT INTO tbl2 VALUES(1, 'value1'); > INSERT INTO tbl2 VALUES(1, 'value2'); > set hive.optimize.index.filter = true; > set hive.auto.convert.join=false; > select tbl1.id, t1.value, t2.value > FROM tbl1 > JOIN (SELECT * FROM tbl2 WHERE value='value1') t1 ON tbl1.id=t1.id > JOIN (SELECT * FROM tbl2 WHERE value='value2') t2 ON tbl1.id=t2.id; > {noformat} > We are enforcing to use common join and tbl2 will have 2 files after 2 insertions underneath. > the map job contains 3 TableScan operators (2 for tbl2 and 1 for tbl1). When hive.optimize.index.filter is set to true, we are incorrectly applying the later filtering condition to each block, which causes no data is returned for the subquery {{SELECT * FROM tbl2 WHERE value='value1'}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)