hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dong Chen" <>
Subject Re: Review Request 36540: HIVE-8128: Improve Parquet Vectorization
Date Tue, 21 Jul 2015 08:44:47 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated July 21, 2015, 8:44 a.m.)

Review request for hive, Ryan Blue, cheng xu, and Sergio Pena.


Review request

Repository: hive-git


This patch is based on the Parquet vector API at

In this POC, the general workflow was done, two tests passed, and INT type was supported.
The idea is that we create a VectorizedParquetRecordReader, which wraps the ParquetRecordReader
provided by Parquet. Then in its next() method, we convert Parquet RowBatch to Hive VectorizedRowBatch.

This is the first patch. To complete vectorization feature, we still have work to do in follow-up:
1) support all data types 2) support partition column 3) add more test cases 4) evaluate performance
on a real cluster.


  pom.xml 1abf738 
  ql/pom.xml 6026c49 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ e1b6dd8 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ 98691c7

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ adeb971

  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/ PRE-CREATION

  ql/src/test/queries/clientpositive/vectorized_parquet_data_types.q PRE-CREATION 
  ql/src/test/results/clientpositive/vectorized_parquet_data_types.q.out PRE-CREATION 



unit test passed


Dong Chen

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message