hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Misha Dmitriev <mi...@cloudera.com>
Subject Re: Review Request 57353: Intern Properties objects referenced from PartitionDesc to reduce memory pressure.
Date Fri, 21 Apr 2017 20:15:51 GMT


> On April 21, 2017, 4:35 p.m., Sergio Pena wrote:
> > Misha, whers is CopyOnFirstWriteProperties used? The patch looks pretty good, but
I don't see where CopyOnFirstWriteProperties is instatiated.
> 
> Misha Dmitriev wrote:
>     It's not instantiated directly. Rather, see the serialization/deserialization code
in SerializationUtilities.java, where this class is indirectly instantiated. My understanding
is that this is how Partitions and their child data structures are created, by transferring
data from HMS.
> 
> Sergio Pena wrote:
>     I still not found how this happens. Could you describe how you understand this happens?
Maybe I can follow you better than the code.

Right, now I understand what you mean. I made a mistake when making some final edits of this
code. A new CopyOnFirstWriteProperties instance should be created in the setProperties() method
of PartitionDesc. I'll make a fix and post a new patch.


- Misha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57353/#review172668
-----------------------------------------------------------


On March 7, 2017, 1:22 a.m., Misha Dmitriev wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57353/
> -----------------------------------------------------------
> 
> (Updated March 7, 2017, 1:22 a.m.)
> 
> 
> Review request for hive, Chaozhong Yang, Alan Gates, Rui Li, Prasanth_J, Sergio Pena,
Sahil Takiar, Vihang Karajgaonkar, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-16079
>     https://issues.apache.org/jira/browse/HIVE-16079
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> When multiple concurrent Hive queries run, a separate copy of
> org.apache.hadoop.hive.ql.metadata.Partition and
> ql.plan.PartitionDesc is created for each table partition
> per each query instance. So when in my benchmark explained in
> HIVE-16079 we have 2000 partitions and 50 concurrent queries running
> over them, we end up, in the worst case, with 2000*50=100,000 instances
> of Partition and PartitionDesc in memory. These objects themselves
> collectively take just ~2% of memory. However, other data structures
> that each of them reference, take a lot more. In particular, Properties
> objects take more than 20% of memory. When we have 50 concurrent
> read-only queries, there are 50 identical copies of Properties per
> each partition. That's a huge waste of memory.
> 
> This change introduces a new class that extends Properties, called
> CopyOnFirstWriteProperties. It utilizes a unique interned copy of
> Properties whenever possible. However, when one of the methods that
> modify properties is called, a copy is created. When this class is
> used, memory consumption by Properties falls from 20% to 5..6%.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/CopyOnFirstWriteProperties.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java 247d5890ea8131404b9543d22876ca4c052578e0

>   ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java d05c1c68fdb7296c0346d73967071da1ebe7bb72

> 
> 
> Diff: https://reviews.apache.org/r/57353/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Misha Dmitriev
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message