hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-13339) MetricsSourceAdapter#updateAttrCache may throw NPE due to NULL lastRecs
Date Tue, 05 Jul 2016 01:39:11 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361880#comment-15361880
] 

Yongjun Zhang edited comment on HADOOP-13339 at 7/5/16 1:38 AM:
----------------------------------------------------------------

Hi [~ozawa],

Thanks for pointing me to HADOOP-11361.

I was about to update here:

Per the trace stack I'm seeing:
{code}
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateAttrCache(MetricsSourceAdapter.java:260)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:184)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:155)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBeanInfo(DefaultMBeanServerInterceptor.java:1378)
{code}

The relevant code is

{code}
  private void updateJmxCache() {
    boolean getAllMetrics = false;
    synchronized(this) {
      if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
        // temporarilly advance the expiry while updating the cache
        jmxCacheTS = Time.now() + jmxCacheTTL;
        // lastRecs might have been set to an object already by another thread.
        // Track the fact that lastRecs has been reset once to make sure refresh
        // is correctly triggered.
        if (lastRecsCleared) {
          getAllMetrics = true;
          lastRecsCleared = false;
        }
      }
      else {
        return;
      }
    }

    if (getAllMetrics) {
      MetricsCollectorImpl builder = new MetricsCollectorImpl();
      getMetrics(builder, true); <== This is where lastRecs is created/assigned non-NULL
    }

    synchronized(this) {
      updateAttrCache(); <=== This is where the NPE happened
      if (getAllMetrics) {
        updateInfoCache();
      }
      jmxCacheTS = Time.now();
      lastRecs = null;  // in case regular interval update is not running <==  This is
where lastRecs assigned NULL
      lastRecsCleared = true;
    }
  }
{code}

If one thread enters the above method with {{Time.now() - jmxCacheTS >= jmxCacheTTL}} being
false, then {{getAllMetrics}} will continue to be false, then when {{updateAttrCache()}} is
called, it will hit NULL {{lastRecs}}

Even a single thread like this would hit the issue. So it doesn't seem a pure synchronization
issue.  And the code prior to HADOOP-12482 appear to have the same issue.

Brief look at HADOOP-12482, the logic seems fine, except the hole pointed out here. While
we need to examine the synchronization a bit further, have a checking in {{MetricsSourceAdapter#updateAttrCache}}
(so not to access it when lastRecs is NULL) seems reasonable.

What do you think?

Thanks.




was (Author: yzhangal):
Hi [~ozawa],

Thanks for pointing me to HADOOP-11361.

I was about to update here:

Per the trace stack I'm seeing:
{code}
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateAttrCache(MetricsSourceAdapter.java:260)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:184)
        at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:155)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBeanInfo(DefaultMBeanServerInterceptor.java:1378)
{code}

The relevant code is

{code}
  private void updateJmxCache() {
    boolean getAllMetrics = false;
    synchronized(this) {
      if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
        // temporarilly advance the expiry while updating the cache
        jmxCacheTS = Time.now() + jmxCacheTTL;
        // lastRecs might have been set to an object already by another thread.
        // Track the fact that lastRecs has been reset once to make sure refresh
        // is correctly triggered.
        if (lastRecsCleared) {
          getAllMetrics = true;
          lastRecsCleared = false;
        }
      }
      else {
        return;
      }
    }

    if (getAllMetrics) {
      MetricsCollectorImpl builder = new MetricsCollectorImpl();
      getMetrics(builder, true); <== This is where lastRecs is created/assigned non-NULL
    }

    synchronized(this) {
      updateAttrCache(); <=== This is where the NPE happened
      if (getAllMetrics) {
        updateInfoCache();
      }
      jmxCacheTS = Time.now();
      lastRecs = null;  // in case regular interval update is not running <==  This is
where lastRecs assigned NULL
      lastRecsCleared = true;
    }
  }
{code}

If one thread enters the above method with {{Time.now() - jmxCacheTS >= jmxCacheTTL}} being
false, then {{getAllMetrics}} will continue to be false, then when {{updateAttrCache()}} is
called, it will hit NULL {{lastRecs}}

Even a single thread like this would hit the issue. So it doesn't seem a pure synchronization
issue.  And the code prior to HADOOP-12482 appear to have the same issue.

Brief look at HADOOP-12482, the logic seems fine, except the hole pointed out here. While
we need to examine the synchronization a bit further, have a checking in MetricsSourceAdapter#updateAttrCache
(so not to access it when lastRecs is NULL) seems reasonable.

What do you think?

Thanks.



> MetricsSourceAdapter#updateAttrCache may throw NPE due to NULL lastRecs
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-13339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13339
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>
> The for loop below may find lastRecs NULL
> {code}
>   private int updateAttrCache() {
>     LOG.debug("Updating attr cache...");
>     int recNo = 0;
>     int numMetrics = 0;
>     for (MetricsRecordImpl record : lastRecs) {
>       for (MetricsTag t : record.tags()) {
>         setAttrCacheTag(t, recNo);
>         ++numMetrics;
>       }
>       for (AbstractMetric m : record.metrics()) {
>         setAttrCacheMetric(m, recNo);
>         ++numMetrics;
>       }
>       ++recNo;
>     }
>     LOG.debug("Done. # tags & metrics="+ numMetrics);
>     return numMetrics;
>   }
> {code}
> and throws NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message