Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D19ADDA7B for ; Thu, 30 Aug 2012 14:23:07 +0000 (UTC) Received: (qmail 17468 invoked by uid 500); 30 Aug 2012 14:23:07 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 17396 invoked by uid 500); 30 Aug 2012 14:23:07 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 17387 invoked by uid 99); 30 Aug 2012 14:23:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Aug 2012 14:23:07 +0000 Date: Fri, 31 Aug 2012 01:23:07 +1100 (NCT) From: "Tsz Wo (Nicholas), SZE (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <2136795542.16669.1346336587617.JavaMail.jiratomcat@arcas> In-Reply-To: <193540206.20289.1339797162478.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444975#comment-13444975 ] Tsz Wo (Nicholas), SZE commented on HDFS-3540: ---------------------------------------------- {quote} Recovery mode will always prompt before doing anything which could lead to data loss. So no, stray OP_INVALID bytes will not lead to silent data loss. Actually, looking at change 1349086, which was introduced by HDFS-3521, I see that it broke end-of-file checking by default. Since dfs.namenode.edits.toleration.length is -1 by default, FSEditLog#checkEndOfLog is never invoked. However, this is not a problem with Recovery Mode; it's a problem with change 1349086. {quote} Before HDFS-3521, there is a UNCHECKED_REGION_LENGTH in Recovery Mode. If a stray OP_INVALID byte is within the unchecked region, it will cause silent data loss. {quote} Recovery Mode does consider the corruption length. The location at which the problem occurred is printed out. This is the message "Failed to parse edit log () at position , edit log length is ..." This information is provided to allow the system administrator to make an informed decision. {quote} You still do not know the corruption length since there may be padding at the end. System admins won't know the padding length and so they won't be able to know the corruption length. > Further improvement on recovery mode and edit log toleration in branch-1 > ------------------------------------------------------------------------ > > Key: HDFS-3540 > URL: https://issues.apache.org/jira/browse/HDFS-3540 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 1.2.0 > Reporter: Tsz Wo (Nicholas), SZE > Assignee: Tsz Wo (Nicholas), SZE > > *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1. However, the recovery mode feature in branch-1 is dramatically different from the recovery mode in trunk since the edit log implementations in these two branch are different. For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not in trunk. > *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy UNCHECKED_REGION_LENGTH and to tolerate edit log corruption. > There are overlaps between these two features. We study potential further improvement in this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira