hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Xiaoqiao (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-9068) SBN checkpoint could not work after the only name directory recovery from failure
Date Sun, 13 Sep 2015 06:07:45 GMT
He Xiaoqiao created HDFS-9068:
---------------------------------

             Summary: SBN checkpoint could not work after the only name directory recovery
from failure
                 Key: HDFS-9068
                 URL: https://issues.apache.org/jira/browse/HDFS-9068
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 2.4.1
            Reporter: He Xiaoqiao


SBN does checkpoint to {{dfs.namenode.name.dir}} peroidly, but the checkpointer could not
work when there is only one directory in configuration item {{dfs.namenode.name.dir}} and
the disk which the directory located recoveries from failure.
The impact of class is org.apache.hadoop.hdfs.server.namenode.FSImage.java
{code:title=org.apache.hadoop.hdfs.server.namenode.FSImage.java|borderStyle=solid}
@Override
public void run() {
  try {
    saveFSImage(context, sd, nnf);
  } catch (SaveNamespaceCancelledException snce) {
    LOG.info("Cancelled image saving for " + sd.getRoot() +
        ": " + snce.getMessage());
    // don't report an error on the storage dir!
  } catch (Throwable t) {
    LOG.error("Unable to save image for " + sd.getRoot(), t);
    context.reportErrorOnStorageDirectory(sd);
  }
}
{code}
sd is added to errorSDs: {{context.reportErrorOnStorageDirectory(sd)}}, it will never be used
when {{saveFSImage(context, sd, nnf)}} failed becasue storage is Not available or failed even
if it recovers from failure. Then JournalNode will accumulate a large number of editlog files
since checkpointer failed and NameNode will restart for log time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message