[ https://issues.apache.org/jira/browse/HDFS-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124466#comment-13124466 ] Aaron T. Myers commented on HDFS-2422: -------------------------------------- Looks like this is happening because {{o.a.h.fs.DF}} will return 0 for "space available" on a directory which doesn't exist: {noformat} [01:29:11] atm@simon:~$ hadoop org.apache.hadoop.fs.DF / df -k null null 72718632 49480712 19543996 73% null [01:29:23] atm@simon:~$ hadoop org.apache.hadoop.fs.DF /foo/bar/baz df -k null null 0 0 0 0% null {noformat} I'm guessing the particular {{dfs.name.dir}} the NN was writing to was in fact a subdirectory of the mount directory, so when the NFS mount went away so did the subdirectory, causing DF to return 0. I think this is indicative of a more basic issue with the {{NNResourceChecker}} policy, though. When syncing edit logs, the NN is designed to tolerate failure of up to N-1 {{dfs.name.dirs}}, but the {{NNResourceChecker}} will put the NN into safemode if only a single {{dfs.name.dir}} is low on space. The appropriate solution, then, seems to me to be to change the {{NNResourceChecker}} to also tolerate up to N-1 directories being low on space. I'll create a patch to do this and upload it shortly. > temporary loss of NFS mount causes NN safe mode > ----------------------------------------------- > > Key: HDFS-2422 > URL: https://issues.apache.org/jira/browse/HDFS-2422 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.24.0 > Reporter: Jeff Bean > Assignee: Aaron T. Myers > > We encountered a situation where the namenode dropped into safe mode after a temporary outage of an NFS mount. > At 12:10 the NFS server goes offline > Oct 8 12:10:05 kernel: nfs: server not responding, timed out > This caused the namenode to conclude resource issues: > 2011-10-08 12:10:34,848 WARN org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available on volume '' is 0, which is below the configured reserved amount 104857600 > Temporary loss of NFS mount shouldn't cause safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira