hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Misha Dmitriev <mi...@cloudera.com>
Subject Re: Review Request 57353: Intern Properties objects referenced from PartitionDesc to reduce memory pressure.
Date Fri, 21 Apr 2017 20:26:56 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57353/
-----------------------------------------------------------

(Updated April 21, 2017, 8:26 p.m.)


Review request for hive, Chaozhong Yang, Alan Gates, Rui Li, Prasanth_J, Sergio Pena, Sahil
Takiar, Vihang Karajgaonkar, and Xuefu Zhang.


Changes
-------

Addressed Sergio's comments.


Bugs: HIVE-16079
    https://issues.apache.org/jira/browse/HIVE-16079


Repository: hive-git


Description
-------

When multiple concurrent Hive queries run, a separate copy of
org.apache.hadoop.hive.ql.metadata.Partition and
ql.plan.PartitionDesc is created for each table partition
per each query instance. So when in my benchmark explained in
HIVE-16079 we have 2000 partitions and 50 concurrent queries running
over them, we end up, in the worst case, with 2000*50=100,000 instances
of Partition and PartitionDesc in memory. These objects themselves
collectively take just ~2% of memory. However, other data structures
that each of them reference, take a lot more. In particular, Properties
objects take more than 20% of memory. When we have 50 concurrent
read-only queries, there are 50 identical copies of Properties per
each partition. That's a huge waste of memory.

This change introduces a new class that extends Properties, called
CopyOnFirstWriteProperties. It utilizes a unique interned copy of
Properties whenever possible. However, when one of the methods that
modify properties is called, a copy is created. When this class is
used, memory consumption by Properties falls from 20% to 5..6%.


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/common/CopyOnFirstWriteProperties.java PRE-CREATION

  ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java 247d5890ea8131404b9543d22876ca4c052578e0

  ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java d05c1c68fdb7296c0346d73967071da1ebe7bb72



Diff: https://reviews.apache.org/r/57353/diff/2/

Changes: https://reviews.apache.org/r/57353/diff/1-2/


Testing
-------


Thanks,

Misha Dmitriev


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message