hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "cheng xu" <cheng.a...@intel.com>
Subject Re: Review Request 31800: HIVE-9658 Reduce parquet memory use by bypassing java primitive objects on ETypeConverter
Date Mon, 23 Mar 2015 00:53:59 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31800/#review77362
-----------------------------------------------------------

Ship it!


Ship It!

- cheng xu


On March 20, 2015, 3:59 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31800/
> -----------------------------------------------------------
> 
> (Updated March 20, 2015, 3:59 p.m.)
> 
> 
> Review request for hive, Ryan Blue and cheng xu.
> 
> 
> Bugs: HIVE-9658
>     https://issues.apache.org/jira/browse/HIVE-9658
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch bypasses primitive java objects to hive object inspectors without using primitive
Writable objects.
> It helps to reduce memory usage.
> 
> I did not bypass other complex objects, such as binaries, decimal and date/timestamp,
because their Writable objects are needed in other parts of the code,
> and creating them later takes more ops/s to do it. Better save time at the beginning.
> 
> 
> Diffs
> -----
> 
>   itests/hive-jmh/src/main/java/org/apache/hive/benchmark/storage/ColumnarStorageBench.java
4f6985cd13017ce37f4f0c100b16a27aa5b02f8b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java c915f728fc9b27da0fabefab5d8f5faa53640b78

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java 0391229723cc3ecef551fa44b8456b0d2ac93fb5

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java
d7edd52614771857d1b21971a66894841c248ef9 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ConverterParent.java 6ff6b473c9f1867bc14bb597094ddb92487cc954

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableRecordConverter.java
a43661eb54ba29692c07c264584b5aecf648ef99 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 3fc012970e23bbc188ce2a2e2ba0b04bc6f22317

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveCollectionConverter.java
f1c8b6f13718b37f590263e5b35ed6c327f5cf4f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveGroupConverter.java c6d03a19029d5bcc86b998dd7a8609973648c103

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java f95d15eddc21bc432fa53572de5756751a13341a

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/Repeated.java ee57b31dac53d99af0c5a520f51102796ca32fd3

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
57ae7a9740d55b407cadfc8bc030593b29f90700 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
a26199612cf338e336f210f29acb0398c536e1f9 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
49bf1c5325833993f4c09efdf1546af560783c28 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
609188206f88e296d893b84bcaaab53f974e6b7d 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/DeepParquetHiveMapInspector.java
143d72e76502d4877e8208181d9743259051dcea 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ObjectArrayWritableObjectInspector.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveArrayInspector.java
bde0dcbb3978ba47b15ae2c9bbe2f87ed3984ab1 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 7fd5e9612d4e3c9bf3b816bc48dbdbe59fb8a5a8

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/StandardParquetHiveMapInspector.java
22250b30a14d52907fb22d4f44b93c7633c6a89e 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetByteInspector.java
864f56292fa4856df155f546064e4a6732cc663f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetShortInspector.java
39f265777c7e164382117e3902c3b6e491295f70 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java 3a476731e31bf38822f0d530f0aea2eadb675a49

>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestArrayCompatibility.java d45d8eeb9e8a61f254098ab15d0305fc71152abd

>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 8f03c5b403332f7b36b2271a2246a0fc90b3bfba

>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapStructures.java 3c7401ffbe88ce66b96f9cceab4e9c3d6267f8fe

>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetInputFormat.java
1a54bf5797efd5859c9e665bcc7134168e5d193f 
>   serde/src/java/org/apache/hadoop/hive/serde2/io/ObjectArrayWritable.java PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/31800/diff/
> 
> 
> Testing
> -------
> 
> Some performance tests were done to validate this.
> 
> Schema: int,double,boolean,string,array<int>,map<string,string>,struct<a:int,b:int>
>   
> - JMH (Microbenchmarks) calls on parquet reads.
>   
>   Before: 579 ops/s
>   After:  651 ops/s
> 
> - YourKit Java Profiler to measure memory objects recorded.
>   Reading 20,000 random rows (10 times)
>   
>   Before:
>      Objects recorded:   1,863,610
>      Objects size:       42,373,808
>      Total memory usage: 29%
>      
>   After:
>      Objects recorded:   1,596,804
>      Objects size:       34,192,832
>      Total memory usage: 24%
> 
> All tests were run multiple times to get same results.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message