hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aihua Xu" <...@cloudera.com>
Subject Review Request 41984: HIVE-12762: Common join on parquet tables returns incorrect result when hive.optimize.index.filter set to true
Date Wed, 06 Jan 2016 16:52:22 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41984/
-----------------------------------------------------------

Review request for hive, Sergio Pena and Xuefu Zhang.


Repository: hive-git


Description
-------

HIVE-12762: Common join on parquet tables returns incorrect result when hive.optimize.index.filter
set to true


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9a7d990baaabfde8e564f00bb1fcfe30cd16dc90

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java 017676bec2163f04fb95f43224a4f8743fa49f55

  ql/src/test/queries/clientpositive/parquet_join2.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_join2.q.out PRE-CREATION 
  storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/ExpressionTree.java 577d95d1a15a54c2804349b3e5e68d83b72df664

  storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java eeff131cbc14d7ef554517109612ae7d891f8003


Diff: https://reviews.apache.org/r/41984/diff/


Testing
-------

We have two issues: 1. We are filtering the parquet columns based on the last filter condition
in the query. So if the query contains multiple instances of the same table, e.g., join on
the same table with different filter conditions, then we could get incorrect result; 2. rewriteLeaves
implementation in SearchArgumentImpl is not accurate since the different leaves could be sharing
the same object. The current implementation could change the leave index multiple times to
an incorrect value.

The patch will merge all the filter conditions (create OR expression on all the filters) so
that the columns which will be used during operator won't be filtered during earlier splitting
stage. rewriteLeaves is reimplemented to get all the unique leaves first and replace in place.


Thanks,

Aihua Xu


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message