hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-889) Possible race condition in BlocksMap.NodeIterator.
Date Sat, 09 Jan 2010 14:00:54 GMT
Possible race condition in BlocksMap.NodeIterator.

                 Key: HDFS-889
                 URL: https://issues.apache.org/jira/browse/HDFS-889
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: name-node
    Affects Versions: 0.22.0
            Reporter: Steve Loughran

Hudson's test run for HDFS-165 is showing an NPE in {{org.apache.hadoop.hdfs.server.namenode.TestNodeCount.testNodeCount()}}
One problem could be in {{BlocksMap.NodeIterator}}. It's {{hasNext()}} method checks the next
entry isn't null. But what if between the {{hasNext() call and the next() operation, the map
changes and an entry goes away? In that situation, the node returned from next() will be null.

This is potentially serious as a quick look through the code shows that the iterator gets
retrieved a lot and everywhere hadoop does so, it assumes the value is not null. It's also
one of those problems that doesn't have a simple "make it go away" fix.

# Ignore it, hope it doesn't happen very often and the test failing was a one off that will
never happen in a production datacentre. This is the default. The iterator is only used in
the namenode, so while it does depend on the # of datanodes, it isn't running in 4000 machines
in a big cluster.
# Leave the iterator as is, have all the in-Hadoop code check for a null-value and break the
# Patch the {{NodeIterator}} to be consistent with the {{Iterator}} specification and throw
a {{NoSuchElementException}} if the next value is null. This does not make the problem go
away, but now it is handled by having every use in-Hadoop catching the exception at the right
point and exiting the loop. 

Testing. This should be possible.
# Create a block map
# iterate over a block
# while the iterator is in progress remove the next block in the list. Expect the next call
to next() to fail in whatever way you choose. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message