hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergio Pena" <sergio.p...@cloudera.com>
Subject Re: Review Request 41984: HIVE-12762: Common join on parquet tables returns incorrect result when hive.optimize.index.filter set to true
Date Thu, 07 Jan 2016 21:17:39 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41984/#review113327
-----------------------------------------------------------

Ship it!


Ship It!

- Sergio Pena


On Jan. 6, 2016, 4:52 p.m., Aihua Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41984/
> -----------------------------------------------------------
> 
> (Updated Jan. 6, 2016, 4:52 p.m.)
> 
> 
> Review request for hive, Sergio Pena and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-12762: Common join on parquet tables returns incorrect result when hive.optimize.index.filter
set to true
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9a7d990baaabfde8e564f00bb1fcfe30cd16dc90

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java 017676bec2163f04fb95f43224a4f8743fa49f55

>   ql/src/test/queries/clientpositive/parquet_join2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_join2.q.out PRE-CREATION 
>   storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/ExpressionTree.java 577d95d1a15a54c2804349b3e5e68d83b72df664

>   storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java eeff131cbc14d7ef554517109612ae7d891f8003

> 
> Diff: https://reviews.apache.org/r/41984/diff/
> 
> 
> Testing
> -------
> 
> We have two issues: 1. We are filtering the parquet columns based on the last filter
condition in the query. So if the query contains multiple instances of the same table, e.g.,
join on the same table with different filter conditions, then we could get incorrect result;
2. rewriteLeaves implementation in SearchArgumentImpl is not accurate since the different
leaves could be sharing the same object. The current implementation could change the leave
index multiple times to an incorrect value.
> 
> The patch will merge all the filter conditions (create OR expression on all the filters)
so that the columns which will be used during operator won't be filtered during earlier splitting
stage. rewriteLeaves is reimplemented to get all the unique leaves first and replace in place.
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message