hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
Date Tue, 19 May 2015 18:01:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550894#comment-14550894
] 

zhihai xu commented on YARN-3619:
---------------------------------

I uploaded a patch YARN-3619.000.patch for review. I added a configuration NM_CONTAINER_METRICS_UNREGISTER_DELAY_MS
to configure when to unregister the container metrics after it is finished. Because it may
have potential memory leak If I schedule a thread to do unregistration at getMetrics.
It looks like getMetrics will be called from two places:MetricsSystemImpl#sampleMetrics and
MetricsSourceAdapter#getMBeanInfo.
sampleMetrics won't be called if no sinks in MetricsSystemImpl. getMBeanInfo may not be called
after registration if JMXJsonServlet#doGet is not called(no http Get request from JMX clients).
It looks like there is a possibility that getMetrics won't be called after registration.


> ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-3619
>                 URL: https://issues.apache.org/jira/browse/YARN-3619
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.0
>            Reporter: Jason Lowe
>            Assignee: zhihai xu
>         Attachments: YARN-3619.000.patch, test.patch
>
>
> ContainerMetrics is able to unregister itself during the getMetrics method, but that
method can be called by MetricsSystemImpl.sampleMetrics which is trying to iterate the sources.
 This leads to a ConcurrentModificationException log like this:
> {noformat}
> 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN impl.MetricsSystemImpl:
java.util.ConcurrentModificationException
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message