hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-1878) TestHDFSServerPorts unit test failure - race condition in FSNamesystem.close() causes NullPointerException without serious consequence
Date Fri, 17 Jun 2011 20:17:47 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Owen O'Malley updated HDFS-1878:
--------------------------------

    Fix Version/s:     (was: 0.20.205.0)

> TestHDFSServerPorts unit test failure - race condition in FSNamesystem.close() causes
NullPointerException without serious consequence
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1878
>                 URL: https://issues.apache.org/jira/browse/HDFS-1878
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.204.0
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>            Priority: Minor
>             Fix For: 0.20.204.0
>
>         Attachments: 1878-1.patch
>
>
> In 20.204, TestHDFSServerPorts was observed to intermittently throw a NullPointerException.
 This only happens when FSNamesystem.close() is called, which means system termination for
the Namenode, so this is not a serious bug for .204.  TestHDFSServerPorts is more likely than
normal execution to stimulate the race, because it runs two Namenodes in the same JVM, causing
more interleaving and more potential to see a race condition.
> The race is in FSNamesystem.close(), line 566, we have:
>       if (replthread != null) replthread.interrupt();
>       if (replmon != null) replmon = null;
> Since the interrupted replthread is not waited on, there is a potential race condition
with replmon being nulled before replthread is dead, but replthread references replmon in
computeDatanodeWork() where the NullPointerException occurs.
> The solution is either to wait on replthread or just don't null replmon.  The latter
is preferred, since none of the sibling Namenode processing threads are waited on in close().
> I'll attach a patch for .205.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message