Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9C0E49B4C for ; Thu, 23 Feb 2012 19:25:10 +0000 (UTC) Received: (qmail 42697 invoked by uid 500); 23 Feb 2012 19:25:09 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 42604 invoked by uid 500); 23 Feb 2012 19:25:09 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 42583 invoked by uid 99); 23 Feb 2012 19:25:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Feb 2012 19:25:09 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Feb 2012 19:25:08 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C4469336F1D for ; Thu, 23 Feb 2012 19:24:48 +0000 (UTC) Date: Thu, 23 Feb 2012 19:24:48 +0000 (UTC) From: "Colin Patrick McCabe (Created) (JIRA)" To: hdfs-dev@hadoop.apache.org Message-ID: <240274811.11005.1330025088805.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (HDFS-3004) Create Offline NameNode recovery tool MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Create Offline NameNode recovery tool ------------------------------------- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe We've been talking about creating a tool which can process NameNode edit lo= gs and image files offline. This tool would be similar to a fsck for a conventional filesystem. It wou= ld detect inconsistencies and malformed data. In cases where it was possib= le, and the operator asked for it, it would try to correct the inconsistenc= y. It's probably better to call this "nameNodeRecovery" or similar, rather tha= n "fsck," since we already have a separate and unrelated mechanism which we= refer to as fsck. The use case here is that the NameNode data is corrupt for some reason, and= we want to fix it. Obviously, we would prefer never to get in this case. = In a perfect world, we never would. However, bad data on disk can happen = from time to time, because of hardware errors or misconfigurations. In the= past we have had to correct it manually, which is time-consuming and which= can result in downtime. I would like to reuse as much code as possible from the NameNode in this to= ol. Hopefully, the effort that is spent developing this will also make the= NameNode editLog and image processing even more robust than it already is. Another approach that we have discussed is NOT having an offline tool, but = just having a switch supplied to the NameNode, like "=E2=80=94auto-fix" or = "=E2=80=94force-fix". In that case, the NameNode would attempt to "guess" = when data was missing or incomplete in the EditLog or Image-- rather than a= borting as it does now. Like the proposed fsck tool, this switch could be = used to get users back on their feet quickly after a problem developed. I = am not in favor of this approach, because there is a danger that users coul= d supply this flag in cases where it is not appropriate. This risk does no= t exist for an offline fsck tool, since it would have to be run explicitly.= However, I wanted to mention this proposal here for completeness. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs: https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp= a For more information on JIRA, see: http://www.atlassian.com/software/jira