hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12482) Race condition in JMX cache update
Date Tue, 10 Nov 2015 11:34:11 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998431#comment-14998431
] 

Hudson commented on HADOOP-12482:
---------------------------------

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #591 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/591/])
HADOOP-12482. Race condition in JMX cache update. (Tony Wu via lei) (lei: rev 0eb9c60c5bec79f531da8cb3226d7e8b1d7e6639)
* hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSourceAdapter.java
* hadoop-common-project/hadoop-common/CHANGES.txt
* hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/metrics2/impl/TestMetricsSourceAdapter.java


> Race condition in JMX cache update
> ----------------------------------
>
>                 Key: HADOOP-12482
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12482
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>            Reporter: Tony Wu
>            Assignee: Tony Wu
>             Fix For: 2.8.0, 3.0.0
>
>         Attachments: HADOOP-12482.001.patch, HADOOP-12482.002.patch, HADOOP-12482.003.patch,
HADOOP-12482.004.patch, HADOOP-12482.005.patch, HADOOP-12482.006.patch
>
>
> updateJmxCache() was updated in HADOOP-11301. However the patch introduced a race condition.
In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
>     boolean getAllMetrics = false;
>     synchronized (this) {
>       if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
>         // temporarilly advance the expiry while updating the cache
>         jmxCacheTS = Time.now() + jmxCacheTTL;
>         if (lastRecs == null) {
>           getAllMetrics = true;
>         }
>       } else {
>         return;
>       }
>       if (getAllMetrics) {
>         MetricsCollectorImpl builder = new MetricsCollectorImpl();
>         getMetrics(builder, true);
>       }
>       updateAttrCache();
>       if (getAllMetrics) {
>         updateInfoCache();
>       }
>       jmxCacheTS = Time.now();
>       lastRecs = null; // in case regular interval update is not running
>     }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a different thread
getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will not be set
to true. So updateInfoCache() is not called and getMBeanInfo() returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published until much
later.
> The case can be made worse by a periodic call to getMetrics() (driven by an external
program or script). In such case getMBeanInfo() may never be able to retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call updateInfoCache()
once after jmxCacheTTL, if lastRecs has been set to null by updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message