hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12593) multiple "volatile long" field declarations exist in the Hadoop codebase
Date Mon, 30 Nov 2015 22:32:11 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032618#comment-15032618
] 

Colin Patrick McCabe commented on HADOOP-12593:
-----------------------------------------------

[~gliptak], as the comment you pasted states, "there is only a single writer to thread-local
StatisticsData objects."  A single writer cannot race with itself, as [~sjlee0] commented.

[~sjlee0] wrote:
bq. I'd be the first to admit that the way it is written is not very clear or intentional,
but t appears correct. The only exception to the single-writer rule is Statistics.rootData.
However, that is also currently protected by the Statistics instance lock. Again, that could
be made more explicit (e.g. sometimes synchronization is done in a couple of layers above
in the call chain).

Hmm.  The declaration of {{rootData}} has a comment saying that it is protected by the Statistics
lock.  There's also a comment near {{reset}} talking about the locking, commenting that "Both
reads and writes to rootData are done under the lock, so we're free to modify rootData from
any thread that holds the lock".  Are other places we could add comments to make it clearer?
 To be honest, I think the locking here is better documented than most places in our code
base.

I did a quick scan of our uses of {{volatile long}}, and the only place I can find that might
be a problem is here:
{{hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java}}
might have a "lost update" here:
{code}
    void benchmarkOne() throws IOException {
      for(int idx = 0; idx < opsPerThread; idx++) {
        if((localNumOpsExecuted+1) % statsOp.ugcRefreshCount == 0)
          refreshUserMappingsProto.refreshUserToGroupsMappings();
        long stat = statsOp.executeOp(daemonId, idx, arg1);
        localNumOpsExecuted++;
        localCumulativeTime += stat;  <=== should be using AtomicLong
      }
    }
{code}

{{localCumulativeTime}} should be an AtomicLong to prevent lost updates.

Perhaps we should change this JIRA to be about fixing this issue.  Or maybe we should close
this and file the {{localCumulativeTime}} issue as a follow-on, to make it less confusing.
 [~steve_l], what do you think?

> multiple "volatile long" field declarations exist in the Hadoop codebase
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-12593
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12593
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Priority: Minor
>
> If you get your IDE to scan for "volatile long", you find 20-30 entries. Volatile operations
on `long` variables are not guaranteed to be atomic, so these usages can be vulnerable to
race conditions generating invalid data.
> they need to be replaced by AtomicLong references, except in the specific case that you
want performance values for statistics, and are prepared to take the risk



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message