hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsuyoshi Ozawa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11361) Fix a race condition in MetricsSourceAdapter.updateJmxCache
Date Thu, 07 Jul 2016 14:16:11 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366173#comment-15366173
] 

Tsuyoshi Ozawa commented on HADOOP-11361:
-----------------------------------------

Thank you for explanation and reviews.I took a look more deeper. I think we should check overall
semantics of MetricsSourceAdapter instead of doing workaround. 

At first, I suspect that {{getMetrics}} has a semantics bug. A following condition check whether
infoCache should be updated. 
{code}
      if (lastRecs == null && jmxCacheTS == 0) {
              all = true; // Get all the metrics to populate the sink caches
      }
{code}

{{infoCache}} should be updated in following cases:
1. After updateAttrCache is called. It is expressed as lastRecs is null.
2. Before initialization is done - before calling {{updateJmxCache}}. It is expressed as {{jmxCacheTS
== 0}}. 

I think these condition should be connected with {{OR}} not {{AND}}, so it can be fixed as
follows:

{code}
      if (lastRecs == null || jmxCacheTS == 0) {
        all = true; // Get all the metrics to populate the sink caches
      }
{code}

What do you think?

Next, the NPE related problem:

{quote}
Race condition is there between two threads calling updateJmxCache() at same time.
{quote}

You're right. v3 patch fixed the race condition, but it introduced deadlock between JMXJsonServlet
and ResourceManager's MetricSystem as Jason mentioned on HADOOP-12594:

{quote}
The timer thread has the MetricsSystemImpl lock and is trying to grab the MetricsSourceAdapter
lock. In the meantime the JMX thread has the MetricsSourceAdapter lock and is trying to grab
the MetricsSystemImpl lock. The locking order isn't consistent so we deadlocked.
{quote}

Brahma's solution is a bit tricky, so please let me confirm for a while.

> Fix a race condition in MetricsSourceAdapter.updateJmxCache
> -----------------------------------------------------------
>
>                 Key: HADOOP-11361
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11361
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.4.1, 2.5.1, 2.6.0
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>         Attachments: HADOOP-111361-003.patch, HADOOP-11361-002.patch, HADOOP-11361-004.patch,
HADOOP-11361.patch, HDFS-7487.patch
>
>
> {noformat}
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateAttrCache(MetricsSourceAdapter.java:247)
> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:177)
> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getAttribute(MetricsSourceAdapter.java:102)
> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message