hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-889) Possible race condition in BlocksMap.NodeIterator.
Date Sat, 09 Jan 2010 14:00:54 GMT
Possible race condition in BlocksMap.NodeIterator.
--------------------------------------------------

                 Key: HDFS-889
                 URL: https://issues.apache.org/jira/browse/HDFS-889
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: name-node
    Affects Versions: 0.22.0
            Reporter: Steve Loughran


Hudson's test run for HDFS-165 is showing an NPE in {{org.apache.hadoop.hdfs.server.namenode.TestNodeCount.testNodeCount()}}
One problem could be in {{BlocksMap.NodeIterator}}. It's {{hasNext()}} method checks the next
entry isn't null. But what if between the {{hasNext() call and the next() operation, the map
changes and an entry goes away? In that situation, the node returned from next() will be null.


This is potentially serious as a quick look through the code shows that the iterator gets
retrieved a lot and everywhere hadoop does so, it assumes the value is not null. It's also
one of those problems that doesn't have a simple "make it go away" fix.

Options
# Ignore it, hope it doesn't happen very often and the test failing was a one off that will
never happen in a production datacentre. This is the default. The iterator is only used in
the namenode, so while it does depend on the # of datanodes, it isn't running in 4000 machines
in a big cluster.
# Leave the iterator as is, have all the in-Hadoop code check for a null-value and break the
loop
# Patch the {{NodeIterator}} to be consistent with the {{Iterator}} specification and throw
a {{NoSuchElementException}} if the next value is null. This does not make the problem go
away, but now it is handled by having every use in-Hadoop catching the exception at the right
point and exiting the loop. 

Testing. This should be possible.
# Create a block map
# iterate over a block
# while the iterator is in progress remove the next block in the list. Expect the next call
to next() to fail in whatever way you choose. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message