hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Remus Rusanu" <>
Subject Re: Review Request 17899: HIVE-5998 Add vectorized reader for Parquet files
Date Thu, 13 Feb 2014 08:26:37 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Feb. 13, 2014, 8:26 a.m.)

Review request for hive, Brock Noland, Eric Hanson, and Jitendra Pandey.

Bugs: HIVE-5998

Repository: hive-git


Implementation is straight forward and very simple, but offers all benefits of vectorization
possible with a 'shallow' vectorized reader (ie. one that doe not got into parquet-mr project
changes). the only complication arrised because of discrepancies between the object inspector
seen by the inputformat and the actual output provided by the Parquet readers (eg. OI declares
'byte' primitives but the Parquet reader outputs IntWritable). I had to create a just-in-time
VectorColumnAssigner colelciton base don whatever writers the Parquet record reader provides.
It is assumed the reader does not change it's output during the iteration.

Diffs (updated)

  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ d1a75df

  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ 0b504de 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ d409d44 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ d3412df 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ PRE-CREATION

  ql/src/test/queries/clientpositive/vectorized_parquet.q PRE-CREATION 
  ql/src/test/results/clientpositive/vectorized_parquet.q.out PRE-CREATION 


Testing (updated)

Manually tested. New query .q added.


Remus Rusanu

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message