Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 856F117DFA for ; Wed, 25 Mar 2015 18:15:09 +0000 (UTC) Received: (qmail 75293 invoked by uid 500); 25 Mar 2015 18:14:53 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 75240 invoked by uid 500); 25 Mar 2015 18:14:53 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 75228 invoked by uid 99); 25 Mar 2015 18:14:53 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Mar 2015 18:14:53 +0000 Date: Wed, 25 Mar 2015 18:14:53 +0000 (UTC) From: "Hudson (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-6353) Check and make checkpoint before stopping the NameNode MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380390#comment-14380390 ] Hudson commented on HDFS-6353: ------------------------------ FAILURE: Integrated in Hadoop-trunk-Commit #7433 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7433/]) HDFS-6353. Check and make checkpoint before stopping the NameNode. Contributed by Jing Zhao. (jing9: rev 5e21e4ca377f68e030f8f3436cd93fd7a74dc5e0) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFetchImage.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogRace.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSAclBaseTest.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestStartup.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeFile.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNNStorageRetentionFunctional.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/ClientNamenodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotBlocksMap.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java * hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestParallelImageWrite.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSaveNamespace.java * hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/contrib/bkjournal/TestBootstrapStandbyWithBKJM.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/UpgradeUtilities.java > Check and make checkpoint before stopping the NameNode > ------------------------------------------------------ > > Key: HDFS-6353 > URL: https://issues.apache.org/jira/browse/HDFS-6353 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode > Reporter: Suresh Srinivas > Assignee: Jing Zhao > Attachments: HDFS-6353.000.patch, HDFS-6353.001.patch, HDFS-6353.002.patch > > > One of the failure patterns I have seen is, in some rare circumstances, due to some inconsistency the secondary or standby fails to consume editlog. The only solution when this happens is to save the namespace at the current active namenode. But sometimes when this happens, unsuspecting admin might end up restarting the namenode, requiring more complicated solution to the problem (such as ignore editlog record that cannot be consumed etc.). > How about adding the following functionality: > When checkpointer (standby or secondary) fails to consume editlog, based on a configurable flag (on/off) to let the active namenode know about this failure. Active namenode can enters safemode and saves namespace. When in this type of safemode, namenode UI also shows information about checkpoint failure and that it is saving namespace. Once the namespace is saved, namenode can come out of safemode. > This means service unavailability (even in HA cluster). But it might be worth it to avoid long startup times or need for other manual fixes. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)