hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongzhi Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16869) Hive returns wrong result when predicates on non-existing columns are pushed down to Parquet reader
Date Mon, 26 Jun 2017 15:17:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063242#comment-16063242
] 

Yongzhi Chen commented on HIVE-16869:
-------------------------------------

return null when any of the "or" sub-condition return null is more like turn off hive.optimize.index.filter
 when the filter has none existing columns in parquet file. It is a fast fix before the partition
filter issue is handled by parquet. 
The change looks good. +1

> Hive returns wrong result when predicates on non-existing columns are pushed down to
Parquet reader
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16869
>                 URL: https://issues.apache.org/jira/browse/HIVE-16869
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Yibing Shi
>            Assignee: Yibing Shi
>            Priority: Critical
>         Attachments: HIVE-16869.1.patch, HIVE-16869.2.patch
>
>
> When {{hive.optimize.ppd}} and {{hive.optimize.index.filter}} are turned, and a select
query has a condition on a column that doesn't exist in Parquet file (such as a partition
column), Hive often returns wrong result.
> Please see below example for details:
> {noformat}
> hive> create table test_parq (a int, b int) partitioned by (p int) stored as parquet;
> OK
> Time taken: 0.292 seconds
> hive> insert overwrite table test_parq partition (p=1) values (1, 2);
> OK
> Time taken: 5.08 seconds
> hive> select * from test_parq where a=1 and p=1;
> OK
> 1	2	1
> Time taken: 0.441 seconds, Fetched: 1 row(s)
> hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999);
> OK
> 1	2	1
> Time taken: 0.197 seconds, Fetched: 1 row(s)
> hive> set hive.optimize.index.filter=true;
> hive> select * from test_parq where (a=1 and p=1) or (a=999 and p=999);
> OK
> Time taken: 0.167 seconds
> hive> select * from test_parq where (a=1 or a=999) and (a=999 or p=1);
> OK
> Time taken: 0.563 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message