hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dong Chen" <dong1.c...@intel.com>
Subject Re: Review Request 36540: HIVE-8128: Improve Parquet Vectorization
Date Tue, 21 Jul 2015 08:44:47 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36540/
-----------------------------------------------------------

(Updated July 21, 2015, 8:44 a.m.)


Review request for hive, Ryan Blue, cheng xu, and Sergio Pena.


Changes
-------

Review request


Repository: hive-git


Description
-------

This patch is based on the Parquet vector API at https://github.com/nezihyigitbasi-nflx/parquet-mr/commits/vector

In this POC, the general workflow was done, two tests passed, and INT type was supported.
The idea is that we create a VectorizedParquetRecordReader, which wraps the ParquetRecordReader
provided by Parquet. Then in its next() method, we convert Parquet RowBatch to Hive VectorizedRowBatch.

This is the first patch. To complete vectorization feature, we still have work to do in follow-up:
1) support all data types 2) support partition column 3) add more test cases 4) evaluate performance
on a real cluster.


Diffs
-----

  pom.xml 1abf738 
  ql/pom.xml 6026c49 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java e1b6dd8 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java 98691c7

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java adeb971

  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedParquetReader.java PRE-CREATION

  ql/src/test/queries/clientpositive/vectorized_parquet_data_types.q PRE-CREATION 
  ql/src/test/results/clientpositive/vectorized_parquet_data_types.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/36540/diff/


Testing
-------

unit test passed


Thanks,

Dong Chen


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message