Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5CFA018B18 for ; Wed, 27 May 2015 23:09:18 +0000 (UTC) Received: (qmail 69488 invoked by uid 500); 27 May 2015 23:09:18 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 69466 invoked by uid 500); 27 May 2015 23:09:18 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 69456 invoked by uid 99); 27 May 2015 23:09:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 May 2015 23:09:18 +0000 Date: Wed, 27 May 2015 23:09:18 +0000 (UTC) From: "Alexander Pivovarov (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work for large queries MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10841: --------------------------------------- Description: The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10 122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10 122 {code} 3. The workaround is to put "acct.brn is not null" to join condition {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid and acct.brn is not null WHERE L.id = 4436; OK 10 122 {code} I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations have the issue was: The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10 122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10 122 {code} 3. workaround is to put "acct.brn is not null" to join condition {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid and acct.brn is not null WHERE L.id = 4436; {code} I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations have the issue > [WHERE col is not null] does not work for large queries > ------------------------------------------------------- > > Key: HIVE-10841 > URL: https://issues.apache.org/jira/browse/HIVE-10841 > Project: Hive > Issue Type: Bug > Components: Query Processor > Reporter: Alexander Pivovarov > > The result from the following SELECT query is 3 rows but it should be 1 row. > I checked it in MySQL - it returned 1 row. > To reproduce the issue in Hive > 1. prepare tables > {code} > drop table if exists L; > drop table if exists LA; > drop table if exists FR; > drop table if exists A; > drop table if exists PI; > drop table if exists acct; > create table L as select 4436 id; > create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; > create table FR as select 4436 loan_id; > create table A as select 4748 id; > create table PI as select 4415 id; > create table acct as select 4748 aid, 10 acc_n, 122 brn; > insert into table acct values(4748, null, null); > insert into table acct values(4748, null, null); > {code} > 2. run SELECT query > {code} > select > acct.ACC_N, > acct.brn > FROM L > JOIN LA ON L.id = LA.loan_id > JOIN FR ON L.id = FR.loan_id > JOIN A ON LA.aid = A.id > JOIN PI ON PI.id = LA.pi_id > JOIN acct ON A.id = acct.aid > WHERE > L.id = 4436 > and acct.brn is not null; > {code} > the result is 3 rows > {code} > 10 122 > NULL NULL > NULL NULL > {code} > but it should be 1 row > {code} > 10 122 > {code} > 3. The workaround is to put "acct.brn is not null" to join condition > {code} > select > acct.ACC_N, > acct.brn > FROM L > JOIN LA ON L.id = LA.loan_id > JOIN FR ON L.id = FR.loan_id > JOIN A ON LA.aid = A.id > JOIN PI ON PI.id = LA.pi_id > JOIN acct ON A.id = acct.aid and acct.brn is not null > WHERE > L.id = 4436; > OK > 10 122 > {code} > I tried it on hive-1.3.0 (MR) and hive-0.13.1 (MR and Tez) - all combinations have the issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)