hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Misha Dmitriev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7386) Duplicate Strings in various places in Yarn memory
Date Fri, 27 Oct 2017 20:38:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16222767#comment-16222767

Misha Dmitriev commented on YARN-7386:

[~rkanter] could you please look at the test failure above? I cannot reproduce it locally,
and in any case my change, which is only about interning some strings, is the safest possible
thing. So I suspect that this is just a flaky test.

> Duplicate Strings in various places in Yarn memory
> --------------------------------------------------
>                 Key: YARN-7386
>                 URL: https://issues.apache.org/jira/browse/YARN-7386
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>         Attachments: YARN-7386.01.patch, YARN-7386.02.patch
> Using jxray (www.jxray.com) I've analyzed a Yarn RM heap dump obtained in a big cluster.
The tool uncovered several sources of memory waste. One problem is duplicate strings:
> {code}
> Total strings 	  Unique strings 	  Duplicate values 	 Overhead 
>  361,506	 86,672	 5,928	22,886K (7.6%)
> {code}
> They are spread across a number of locations. The biggest source of waste is the following
reference chain:
> {code}
> 7,416K (2.5%), 31292 / 62% dup strings (499 unique), 31292 dup backing arrays:
> ↖{j.u.HashMap}.values
> ↖org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.environment
> ↖org.apache.hadoop.yarn.api.records.impl.pb.ApplicationSubmissionContextPBImpl.amContainer
> ↖org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.submissionContext
> ↖{java.util.concurrent.ConcurrentHashMap}.values
> ↖org.apache.hadoop.yarn.server.resourcemanager.RMActiveServiceContext.applications
> ↖org.apache.hadoop.yarn.server.resourcemanager.RMContextImpl.activeServiceContext
> ↖org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor.rmContext
> ↖Java Local@3ed9ef820 (org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor)
> {code}
> However, there are also many others. Mostly they are strings in proto buffer or proto
buffer builder objects. I plan to get rid of at least the worst offenders by inserting String.intern()
calls. String.intern() used to consume memory in PermGen and was not very scalable up until
about the early JDK 7 versions, but has greatly improved since then, and I've used it many
times without any issues.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message