hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11361) Fix a race condition in MetricsSourceAdapter.updateJmxCache
Date Thu, 07 Jul 2016 21:22:11 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366789#comment-15366789
] 

Yongjun Zhang commented on HADOOP-11361:
----------------------------------------

Hi [~vinayrpet] and [~ozawa],

Thanks for the discussion.

First of all, I tend to agree with Tsuyoshi on the AND/OR analysis, however, I saw that all
calls to {{getMetrics}} method passed true to {{all}} parameter, so this change won't impact
the current final result. But we certainly can address this issue, either as a separate jira,
or a side effect of the fix here.
 
Thanks  [~vinayrpet] for the explanation of the race condition. Sorry I did miss the "return"
statement when making the earlier comment.

It looks to me that the acquirement of {{lastRecs}} and the {{updateAttrCache}} should be
protected in a same {{synchronized(this)}} block, to avoid this race condition. And indeed
the two threads Vinay described intend to have their own builder, their own lastRecs.  Say,
we can break down the {{getMetrics}} method into two parts, 

{code}
  void getMetricsPart1(MetricsCollectorImpl builder,
                                         boolean all) {
    builder.setRecordFilter(recordFilter).setMetricFilter(metricFilter);
    synchronized(this) {
      if (lastRecs == null || jmxCacheTS == 0) {
        all = true; // Get all the metrics to populate the sink caches
      }
    }
    try {
      source.getMetrics(builder, all);
    } catch (Exception e) {
      LOG.error("Error getting metrics from source "+ name, e);
    }
    for (MetricsRecordBuilderImpl rb : builder) {
      for (MetricsTag t : injectedTags) {
        rb.add(t);
      }
    }
  }

  Iterable<MetricsRecordImpl> getMetricsPart2(MetricsCollectorImpl builder) {
      lastRecs = builder.getRecords();
      return lastRecs;
  }

{code}

and change 
{code}
    if (getAllMetrics) {
      MetricsCollectorImpl builder = new MetricsCollectorImpl();
      getMetrics(builder, true);
    }

    synchronized(this) {
      updateAttrCache();
      if (getAllMetrics) {
        updateInfoCache();
      }
      jmxCacheTS = Time.now();
      lastRecs = null;  // in case regular interval update is not running
      lastRecsCleared = true;
    }

{code}

to

{code}
    MetricsCollectorImpl builder = NULL;
    if (getAllMetrics) {
      builder = new MetricsCollectorImpl();
      getMetricsPart1(builder, true);
    }

    synchronized(this) {
      if (getAllMetrics) {
         getMetricsPart2(builder);
      }
      updateAttrCache();
      if (getAllMetrics) {
        updateInfoCache();
      }
      jmxCacheTS = Time.now();
      lastRecs = null;  // in case regular interval update is not running
      lastRecsCleared = true;
    }
{code}

This is just to throw an idea to see what's the best solution. The method names can be adjusted.

What do you guys think?

Thanks.


> Fix a race condition in MetricsSourceAdapter.updateJmxCache
> -----------------------------------------------------------
>
>                 Key: HADOOP-11361
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11361
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.4.1, 2.5.1, 2.6.0
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>         Attachments: HADOOP-111361-003.patch, HADOOP-11361-002.patch, HADOOP-11361-004.patch,
HADOOP-11361.patch, HDFS-7487.patch
>
>
> {noformat}
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateAttrCache(MetricsSourceAdapter.java:247)
> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:177)
> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getAttribute(MetricsSourceAdapter.java:102)
> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message