hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-9067) YARN Resource Manager is running OOM because of leak of Configuration Object
Date Wed, 28 Nov 2018 20:55:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-9067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16702378#comment-16702378

Eric Yang commented on YARN-9067:

When YARN service throws exception during service operations, FileSystem may not close properly.
 Patch 002 address this by making sure the close method is called on operation failures.

> YARN Resource Manager is running OOM because of leak of Configuration Object
> ----------------------------------------------------------------------------
>                 Key: YARN-9067
>                 URL: https://issues.apache.org/jira/browse/YARN-9067
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn-native-services
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Major
>         Attachments: YARN-9067.001.patch, YARN-9067.002.patch, image-2018-11-27-09-55-16-549.png
> Resource Manager is running out of memory after every 2-3 days in dev cluster,
>  After Analyzing the memory dump , it looks like HDFS is leaking configuration object
causing YARN RM OOM.
>  GC Logs:
> {code:java}
> PSYoungGen      total 52736K, used 37813K [0x00000000eab00000, 0x00000000eec80000, 0x0000000100000000)
>   eden space 39424K, 95% used [0x00000000eab00000,0x00000000ecfed620,0x00000000ed180000)
>   from space 13312K, 0% used [0x00000000edf80000,0x00000000edf80000,0x00000000eec80000)
>   to   space 13824K, 0% used [0x00000000ed180000,0x00000000ed180000,0x00000000edf00000)
>  ParOldGen       total 699392K, used 699329K [0x00000000c0000000, 0x00000000eab00000,
>   object space 699392K, 99% used [0x00000000c0000000,0x00000000eaaf04a8,0x00000000eab00000)
>  Metaspace       used 98178K, capacity 99932K, committed 100440K, reserved 1138688K
>   class space    used 10481K, capacity 10829K, committed 10880K, reserved 1048576K
> {code}
> More than 8K objects of org/apache/Hadoop/Conf and most frequent code path to create
Hadoop Configuration object is coming from org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
and all these object are kept in memory, see the attached screenshot for the path to GC root
for conf object.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message