hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Misha Dmitriev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8872) Optimize collections used by Yarn JHS to reduce its memory
Date Fri, 12 Oct 2018 17:57:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16648225#comment-16648225
] 

Misha Dmitriev commented on YARN-8872:
--------------------------------------

Regarding the problems in Hadoop QA report above:
 # No tests are added because this is a performance improvement, no change in functionality
 # I believe there is no problem with synchronization in FileSystemCounterGroup.java. The {{map}}
object is created lazily in the synchronized method {{findCounter()}}, so according to the
Java Memory Model, once it's created, it's visible to all the code, both synchronized and
unsynchronized. In other words, the unsynchronized method {{write()}} (line 281 that findbugs
complains about) will never think that {{map == null}} if {{map}} has actually been initialized.
In other aspects it will work same as before.

> Optimize collections used by Yarn JHS to reduce its memory
> ----------------------------------------------------------
>
>                 Key: YARN-8872
>                 URL: https://issues.apache.org/jira/browse/YARN-8872
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: yarn
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>            Priority: Major
>         Attachments: YARN-8872.01.patch, jhs-bad-collections.png
>
>
> We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big heap in
a large clusters, handling large MapReduce jobs. The heap is large (over 32GB) and 21.4% of
it is wasted due to various suboptimal Java collections, mostly maps and lists that are either
empty or contain only one element. In such under-populated collections considerable amount
of memory is still used by just the internal implementation objects. See the attached excerpt
from the jxray report for the details. If certain collections are almost always empty, they
should be initialized lazily. If others almost always have just 1 or 2 elements, they should
be initialized with the appropriate initial capacity of 1 or 2 (the default capacity is 16
for HashMap and 10 for ArrayList).
> Based on the attached report, we should do the following:
>  # {{FileSystemCounterGroup.map}} - initialize lazily
>  # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks only have
one or two attempts
>  # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
>  # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it contains
one diagnostic message most of the time
>  # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to use the more
wasteful LinkedList here) and initialize with capacity 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message