hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yanbo Liang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-3772) HDFS NN will hang in safe mode and never come out if we change the dfs.namenode.replication.min bigger.
Date Wed, 08 Aug 2012 13:44:20 GMT
Yanbo Liang created HDFS-3772:
---------------------------------

             Summary: HDFS NN will hang in safe mode and never come out if we change the dfs.namenode.replication.min
bigger.
                 Key: HDFS-3772
                 URL: https://issues.apache.org/jira/browse/HDFS-3772
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: name-node
    Affects Versions: 2.0.0-alpha
            Reporter: Yanbo Liang


If the NN restarts with a new minimum replication (dfs.namenode.replication.min), any files
created with the old replication count will expected to bump up to the new minimum upon restart
automatically. However, the real case is that if the NN restarts will a new minimum replication
which is bigger than the old one, the NN will hang in safemode and never come out.
The corresponding test case can pass is because we have missing some test coverage. It had
been discussed in HDFS-3734.
If the NN received enough number of reported block which is satisfying the new minimum replication,
it will exit safe mode. However, if we change a bigger minimum replication, there will be
no enough amount blocks which are satisfying the limitation.
Look at the code segment in FSNamesystem.java:
private synchronized void incrementSafeBlockCount(short replication) {
      if (replication == safeReplication) {
        this.blockSafe++;
        checkMode();
      }
    }
The DNs report blocks to NN and if the replication is equal to safeReplication(It is assigned
by the new minimum replication.), we will increment blockSafe. But if we change a bigger minimum
replication, all the blocks whose replications are lower than it can not satisfy this equal
relationship. But actually the NN had received complete block information. It cause blockSafe
will not increment as usual and not reach the enough amount to exit safe mode and then NN
hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message