hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-889) Possible race condition in BlocksMap.NodeIterator.
Date Fri, 14 May 2010 15:23:44 GMT

    [ https://issues.apache.org/jira/browse/HDFS-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867527#action_12867527

Steve Loughran commented on HDFS-889:

>Is this just a test bug?
I don't know. We've only seen it in tests, but that doesn't mean that it hasn't happened out
in the field?

> Possible race condition in BlocksMap.NodeIterator.
> --------------------------------------------------
>                 Key: HDFS-889
>                 URL: https://issues.apache.org/jira/browse/HDFS-889
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Steve Loughran
> Hudson's test run for HDFS-165 is showing an NPE in {{org.apache.hadoop.hdfs.server.namenode.TestNodeCount.testNodeCount()}}
> One problem could be in {{BlocksMap.NodeIterator}}. It's {{hasNext()}} method checks
the next entry isn't null. But what if between the {{hasNext() call and the next() operation,
the map changes and an entry goes away? In that situation, the node returned from next() will
be null. 
> This is potentially serious as a quick look through the code shows that the iterator
gets retrieved a lot and everywhere hadoop does so, it assumes the value is not null. It's
also one of those problems that doesn't have a simple "make it go away" fix.
> Options
> # Ignore it, hope it doesn't happen very often and the test failing was a one off that
will never happen in a production datacentre. This is the default. The iterator is only used
in the namenode, so while it does depend on the # of datanodes, it isn't running in 4000 machines
in a big cluster.
> # Leave the iterator as is, have all the in-Hadoop code check for a null-value and break
the loop
> # Patch the {{NodeIterator}} to be consistent with the {{Iterator}} specification and
throw a {{NoSuchElementException}} if the next value is null. This does not make the problem
go away, but now it is handled by having every use in-Hadoop catching the exception at the
right point and exiting the loop. 
> Testing. This should be possible.
> # Create a block map
> # iterate over a block
> # while the iterator is in progress remove the next block in the list. Expect the next
call to next() to fail in whatever way you choose. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message