Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0A82C9C9E for ; Wed, 8 Aug 2012 13:44:24 +0000 (UTC) Received: (qmail 10617 invoked by uid 500); 8 Aug 2012 13:44:21 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 10541 invoked by uid 500); 8 Aug 2012 13:44:21 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 10202 invoked by uid 99); 8 Aug 2012 13:44:21 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Aug 2012 13:44:21 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 082FA1427FB for ; Wed, 8 Aug 2012 13:44:21 +0000 (UTC) Date: Wed, 8 Aug 2012 13:44:21 +0000 (UTC) From: "Yanbo Liang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <739990325.152.1344433461038.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Created] (HDFS-3772) HDFS NN will hang in safe mode and never come out if we change the dfs.namenode.replication.min bigger. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Yanbo Liang created HDFS-3772: --------------------------------- Summary: HDFS NN will hang in safe mode and never come out if we change the dfs.namenode.replication.min bigger. Key: HDFS-3772 URL: https://issues.apache.org/jira/browse/HDFS-3772 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0-alpha Reporter: Yanbo Liang If the NN restarts with a new minimum replication (dfs.namenode.replication.min), any files created with the old replication count will expected to bump up to the new minimum upon restart automatically. However, the real case is that if the NN restarts will a new minimum replication which is bigger than the old one, the NN will hang in safemode and never come out. The corresponding test case can pass is because we have missing some test coverage. It had been discussed in HDFS-3734. If the NN received enough number of reported block which is satisfying the new minimum replication, it will exit safe mode. However, if we change a bigger minimum replication, there will be no enough amount blocks which are satisfying the limitation. Look at the code segment in FSNamesystem.java: private synchronized void incrementSafeBlockCount(short replication) { if (replication == safeReplication) { this.blockSafe++; checkMode(); } } The DNs report blocks to NN and if the replication is equal to safeReplication(It is assigned by the new minimum replication.), we will increment blockSafe. But if we change a bigger minimum replication, all the blocks whose replications are lower than it can not satisfy this equal relationship. But actually the NN had received complete block information. It cause blockSafe will not increment as usual and not reach the enough amount to exit safe mode and then NN hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira