hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1878) TestHDFSServerPorts unit test failure - race condition in FSNamesystem.close() causes NullPointerException without serious consequence
Date Tue, 03 May 2011 16:11:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028286#comment-13028286
] 

Tsz Wo (Nicholas), SZE commented on HDFS-1878:
----------------------------------------------

+1 patch looks good.

> TestHDFSServerPorts unit test failure - race condition in FSNamesystem.close() causes
NullPointerException without serious consequence
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1878
>                 URL: https://issues.apache.org/jira/browse/HDFS-1878
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.204.0
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>            Priority: Minor
>             Fix For: 0.20.205.0
>
>         Attachments: 1878-1.patch
>
>
> In 20.204, TestHDFSServerPorts was observed to intermittently throw a NullPointerException.
 This only happens when FSNamesystem.close() is called, which means system termination for
the Namenode, so this is not a serious bug for .204.  TestHDFSServerPorts is more likely than
normal execution to stimulate the race, because it runs two Namenodes in the same JVM, causing
more interleaving and more potential to see a race condition.
> The race is in FSNamesystem.close(), line 566, we have:
>       if (replthread != null) replthread.interrupt();
>       if (replmon != null) replmon = null;
> Since the interrupted replthread is not waited on, there is a potential race condition
with replmon being nulled before replthread is dead, but replthread references replmon in
computeDatanodeWork() where the NullPointerException occurs.
> The solution is either to wait on replthread or just don't null replmon.  The latter
is preferred, since none of the sibling Namenode processing threads are waited on in close().
> I'll attach a patch for .205.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message