hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergio Pena" <sergio.p...@cloudera.com>
Subject Review Request 35950: HIVE-11131: Get row information on DataWritableWriter once for better writing performance
Date Fri, 26 Jun 2015 22:58:12 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35950/
-----------------------------------------------------------

Review request for hive, Ryan Blue, cheng xu, and Dong Chen.


Bugs: HIVE-11131
    https://issues.apache.org/jira/browse/HIVE-11131


Repository: hive-git


Description
-------

Implemented data type writers that will be created before the first Hive row is written to
Parquet. These writers contain information about object inspectors and schema of a specific
data type, and calls the specific addXXXX() method used by Parquet for each data type.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java c195c3ec3ddae19bf255fc2c9633f8bf4390f428


Diff: https://reviews.apache.org/r/35950/diff/


Testing
-------

Tests from TestDataWritableWriter run OK.

I run other tests with micro-becnhmarks, and I got some better results from this new implemntation:

Using repeated rows across the file, the speed increased in:

bigint	boolean	double	float	int	string
33.42%	53.66%	35.62%	35.70%	36.02%	5.93%

Using random rows across the file, the speed increased in:

bigint	boolean	double	float	int	string
18.38%	35.52%	44.73%	13.80%	10.68%	10.00%


Thanks,

Sergio Pena


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message