Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 969AF1868B for ; Tue, 15 Dec 2015 17:09:47 +0000 (UTC) Received: (qmail 42930 invoked by uid 500); 15 Dec 2015 17:09:47 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 42783 invoked by uid 500); 15 Dec 2015 17:09:47 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 42710 invoked by uid 99); 15 Dec 2015 17:09:47 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Dec 2015 17:09:47 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id CF3E12C1F75 for ; Tue, 15 Dec 2015 17:09:46 +0000 (UTC) Date: Tue, 15 Dec 2015 17:09:46 +0000 (UTC) From: "Nicholas Brenwald (JIRA)" To: dev@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HIVE-12678) BETWEEN relational operator sometimes returns incorrect results against PARQUET tables MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Nicholas Brenwald created HIVE-12678: ---------------------------------------- Summary: BETWEEN relational operator sometimes returns incorrect results against PARQUET tables Key: HIVE-12678 URL: https://issues.apache.org/jira/browse/HIVE-12678 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Nicholas Brenwald When querying a parquet table, the BETWEEN relational operator returns incorrect results when hive.optimize.index.filter and hive.optimize.ppd.storage are enabled Create a parquet table: {code} create table t(c string) stored as parquet; {code} Insert some strings representing dates {code} insert into t select '2015-12-09' from default.dual limit 1; insert into t select '2015-12-10' from default.dual limit 1; insert into t select '2015-12-11' from default.dual limit 1; {code} h3. Example 1 This query correctly returns 3: {code} set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=true; select count(*) from t where c >= '2015-12-09' and c <= '2015-12-11'; +------+--+ | _c0 | +------+--+ | 3 | +------+--+ {code} This query incorrectly returns 1: {code} set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=true; select count(*) from t where c between '2015-12-09' and '2015-12-11'; +------+--+ | _c0 | +------+--+ | 1 | +------+--+ {code} Disabling hive.optimize.findex.filter resolves the problem. This query now correctly returns 3: {code} set hive.optimize.index.filter=false; set hive.optimize.ppd.storage=true; select count(*) from t where c between '2015-12-09' and '2015-12-11'; +------+--+ | _c0 | +------+--+ | 3 | +------+--+ {code} Disabling hive.optimize.ppd.storage resolves the problem. This query now correctly returns 3: {code} set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=false; select count(*) from t where c between '2015-12-09' and '2015-12-11'; +------+--+ | _c0 | +------+--+ | 3 | +------+--+ {code} h3. Example 2 This query correctly returns 1: {code} set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=true; select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10'; +------+--+ | _c0 | +------+--+ | 1 | +------+--+ {code} This query incorrectly returns 0: {code} set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=true; select count(*) from t where c between '2015-12-10' and '2015-12-10'; +------+--+ | _c0 | +------+--+ | 0 | +------+--+ {code} Disabling hive.optimize.findex.filter resolves the problem. This query now correctly returns 1: {code} set hive.optimize.index.filter=false; set hive.optimize.ppd.storage=true; select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10'; +------+--+ | _c0 | +------+--+ | 1 | +------+--+ {code} Disabling hive.optimize.ppd.storage resolves the problem. This query now correctly returns 1: {code} set hive.optimize.index.filter=true; set hive.optimize.ppd.storage=false; select count(*) from t where c >= '2015-12-10' and c <= '2015-12-10'; +------+--+ | _c0 | +------+--+ | 1 | +------+--+ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)