hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergio Pena" <sergio.p...@cloudera.com>
Subject Re: Review Request 31800: HIVE-9658 Reduce parquet memory use by bypassing java primitive objects on ETypeConverter
Date Tue, 10 Mar 2015 18:02:42 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31800/
-----------------------------------------------------------

(Updated March 10, 2015, 6:02 p.m.)


Review request for hive, Ryan Blue and cheng xu.


Changes
-------

Patch with changes due to trunk merge on parquet branch


Bugs: HIVE-9658
    https://issues.apache.org/jira/browse/HIVE-9658


Repository: hive-git


Description
-------

This patch bypasses primitive java objects to hive object inspectors without using primitive
Writable objects.
It helps to reduce memory usage.

I did not bypass other complex objects, such as binaries, decimal and date/timestamp, because
their Writable objects are needed in other parts of the code,
and creating them later takes more ops/s to do it. Better save time at the beginning.


Diffs (updated)
-----

  itests/hive-jmh/src/main/java/org/apache/hive/benchmark/storage/ColumnarStorageBench.java
4f6985cd13017ce37f4f0c100b16a27aa5b02f8b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java c915f728fc9b27da0fabefab5d8f5faa53640b78

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 0391229723cc3ecef551fa44b8456b0d2ac93fb5

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java d7edd52614771857d1b21971a66894841c248ef9

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ConverterParent.java 6ff6b473c9f1867bc14bb597094ddb92487cc954

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableRecordConverter.java
a43661eb54ba29692c07c264584b5aecf648ef99 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 3fc012970e23bbc188ce2a2e2ba0b04bc6f22317

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveCollectionConverter.java f1c8b6f13718b37f590263e5b35ed6c327f5cf4f

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveGroupConverter.java c6d03a19029d5bcc86b998dd7a8609973648c103

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java f95d15eddc21bc432fa53572de5756751a13341a

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/Repeated.java ee57b31dac53d99af0c5a520f51102796ca32fd3

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java 57ae7a9740d55b407cadfc8bc030593b29f90700

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java a26199612cf338e336f210f29acb0398c536e1f9

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java.orig
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
49bf1c5325833993f4c09efdf1546af560783c28 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
609188206f88e296d893b84bcaaab53f974e6b7d 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/DeepParquetHiveMapInspector.java
143d72e76502d4877e8208181d9743259051dcea 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ObjectArrayWritableObjectInspector.java
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveArrayInspector.java bde0dcbb3978ba47b15ae2c9bbe2f87ed3984ab1

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 7fd5e9612d4e3c9bf3b816bc48dbdbe59fb8a5a8

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/StandardParquetHiveMapInspector.java
22250b30a14d52907fb22d4f44b93c7633c6a89e 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetByteInspector.java
864f56292fa4856df155f546064e4a6732cc663f 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetShortInspector.java
39f265777c7e164382117e3902c3b6e491295f70 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java 3a476731e31bf38822f0d530f0aea2eadb675a49

  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestArrayCompatibility.java d45d8eeb9e8a61f254098ab15d0305fc71152abd

  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 8f03c5b403332f7b36b2271a2246a0fc90b3bfba

  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapStructures.java 3c7401ffbe88ce66b96f9cceab4e9c3d6267f8fe

  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetInputFormat.java 1a54bf5797efd5859c9e665bcc7134168e5d193f

  serde/src/java/org/apache/hadoop/hive/serde2/io/ObjectArrayWritable.java PRE-CREATION 

Diff: https://reviews.apache.org/r/31800/diff/


Testing
-------

Some performance tests were done to validate this.

Schema: int,double,boolean,string,array<int>,map<string,string>,struct<a:int,b:int>
  
- JMH (Microbenchmarks) calls on parquet reads.
  
  Before: 579 ops/s
  After:  651 ops/s

- YourKit Java Profiler to measure memory objects recorded.
  Reading 20,000 random rows (10 times)
  
  Before:
     Objects recorded:   1,863,610
     Objects size:       42,373,808
     Total memory usage: 29%
     
  After:
     Objects recorded:   1,596,804
     Objects size:       34,192,832
     Total memory usage: 24%

All tests were run multiple times to get same results.


Thanks,

Sergio Pena


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message