hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs
Date Mon, 26 Nov 2018 10:03:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698693#comment-16698693
] 

Hive QA commented on HIVE-20760:
--------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 35s{color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 16s{color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 15s{color}
| {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 30s{color} | {color:blue}
common in master has 65 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 12s{color} |
{color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 16s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 16s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 16s{color} | {color:green}
the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 14s{color} | {color:red}
common: The patch generated 3 new + 426 unchanged - 0 fixed = 429 total (was 426) {color}
|
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 38s{color} | {color:red}
common generated 3 new + 65 unchanged - 0 fixed = 68 total (was 65) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 12s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 13s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 11m  5s{color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:common |
|  |  org.apache.hadoop.hive.common.HiveConfProperties.clone() does not call super.clone()
 At HiveConfProperties.java: At HiveConfProperties.java:[line 260] |
|  |  Inconsistent synchronization of org.apache.hadoop.hive.common.HiveConfProperties.interned;
locked 70% of time  Unsynchronized access at HiveConfProperties.java:70% of time  Unsynchronized
access at HiveConfProperties.java:[line 108] |
|  |  org.apache.hadoop.hive.common.HiveConfProperties.getProperty(String, String) is unsynchronized,
org.apache.hadoop.hive.common.HiveConfProperties.setProperty(String, String) is synchronized
 At HiveConfProperties.java:String) is synchronized  At HiveConfProperties.java:[lines 123-130]
|
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03)
x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-15055/dev-support/hive-personality.sh
|
| git revision | master / 0fee288 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-15055/yetus/diff-checkstyle-common.txt
|
| findbugs | http://104.198.109.242/logs//PreCommit-HIVE-Build-15055/yetus/new-findbugs-common.html
|
| modules | C: common U: common |
| Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-15055/yetus.txt |
| Powered by | Apache Yetus    http://yetus.apache.org |


This message was automatically generated.



> Reducing memory overhead due to multiple HiveConfs
> --------------------------------------------------
>
>                 Key: HIVE-20760
>                 URL: https://issues.apache.org/jira/browse/HIVE-20760
>             Project: Hive
>          Issue Type: Improvement
>          Components: Configuration
>            Reporter: Barnabas Maidics
>            Assignee: Barnabas Maidics
>            Priority: Major
>         Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, HIVE-20760-3.patch, HIVE-20760.4.patch,
HIVE-20760.5.patch, HIVE-20760.6.patch, HIVE-20760.7.patch, HIVE-20760.8.patch, HIVE-20760.patch,
hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of {{HiveConf}}. When
running with a large number of cores per executor (HoS), there is a significant (~10%) amount
of memory wasted due to this duplication. 
> I looked into the problem and found a way to reduce the overhead caused by the multiple
HiveConf objects.
> I've created an implementation of Properties, somewhat similar to CopyOnFirstWriteProperties.
CopyOnFirstWriteProperties can't be used to solve this problem, because it drops the interned
Properties right after we add a new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we change the
properties object stored by HiveConf to the new Properties implementation (HiveConfProperties).
We have 2 possible way to do this. Either we change the visibility of the properties field
in the ancestor class (Configuration which comes from hadoop) to protected, or a simpler way
is to just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, every time we
add a new property to HiveConf, we add it to an additional Properties object. This way if
we create multiple HiveConf with the same base properties, they will use the same Properties
object but each session/task can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored the non-interned
properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to HiveServer2,
heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message