hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reuben Kuhnert" <sircodesa...@gmail.com>
Subject Re: Review Request 36942: HIVE-11401: Predicate push down does not work with Parquet when partitions are in the expression
Date Thu, 30 Jul 2015 16:29:51 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36942/#review93587
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
(line 54)
<https://reviews.apache.org/r/36942/#comment147977>

    If the goal here is to get just the top-level fields, can we do something like:
    
    ```
    for (Type field : schema.getFields()) {  
      columns.add(field.getName());
    }
    ``` 
    
    This might be a little bit clearer.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
(line 64)
<https://reviews.apache.org/r/36942/#comment147969>

    Minor nit: Since we have the opportunity to fix it, can we change 'leafs' to 'leaves'.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
(line 102)
<https://reviews.apache.org/r/36942/#comment147978>

    List<T> has O(N) lookup time. Can we store this in a Set<T> (O(1)) instead?


- Reuben Kuhnert


On July 30, 2015, 3:43 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36942/
> -----------------------------------------------------------
> 
> (Updated July 30, 2015, 3:43 p.m.)
> 
> 
> Review request for hive, Aihua Xu, cheng xu, Dong Chen, and Szehon Ho.
> 
> 
> Bugs: HIVE-11401
>     https://issues.apache.org/jira/browse/HIVE-11401
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The following patch reviews the predicate created by Hive, and removes any column that
does not belong to the Parquet schema, such as partitioned columns. This way Parquet can filter
the columns correctly.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetFilterPredicateConverter.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
49e52da2e26fd7213df1db88716eaee94cb536b8 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java
87dd344534f09c7fc565fdc467ac82a51f37ebba 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java
PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestConvertAstToSearchArg.java 85e952fb6855a2a03902ed971f54191837b32dac

>   ql/src/test/queries/clientpositive/parquet_predicate_pushdown.q PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_predicate_pushdown.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/36942/diff/
> 
> 
> Testing
> -------
> 
> Unit tests: TestParquetFilterPredicate.java
> Integration tests: parquet_predicate_pushdown.q
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message