Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 59091 invoked from network); 28 Apr 2010 07:37:02 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Apr 2010 07:37:02 -0000 Received: (qmail 32492 invoked by uid 500); 28 Apr 2010 07:37:02 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 32382 invoked by uid 500); 28 Apr 2010 07:37:00 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 32366 invoked by uid 99); 28 Apr 2010 07:36:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Apr 2010 07:36:59 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Apr 2010 07:36:57 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o3S7aZMi004641 for ; Wed, 28 Apr 2010 07:36:35 GMT Message-ID: <30348911.51641272440195834.JavaMail.jira@thor> Date: Wed, 28 Apr 2010 03:36:35 -0400 (EDT) From: "Todd Lipcon (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Updated: (HDFS-988) saveNamespace can corrupt edits log In-Reply-To: <187002905.368941266527487974.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-988: ----------------------------- Attachment: hdfs-988.txt Attaching an updated patch for trunk. Additions: - commitBlockSynchronization no longer allowed in safemode. Konstantin, do you still prefer that we open a new issue for this? Dhruba seems to agree that it should be disallowed. - startCheckpoint, endCheckpoint, and updatePipeline also check safemode now - the new delegation token logging methods check safemode as well. Some questions for review: - Will logUpdateMasterKey be OK with the SafeModeException? - Are there some asserts we could add to make it easier to catch these bugs in the future? For example, we could assert !namesystem.isInSafeMode() in FSEditLog.logSync(). Then if we ran assertions on unit tests we'd probably notice if we were accidentally making edits while in safe mode. Some responses to review above: bq. FSNamesystem.getAdditionalBlock() checking isInSafeMode() should be before calling chooseTargets(). I would not change getAdditionalBlock() at all. Right now, getAdditionalBlock is split into two synchronized blocks. The safe mode status could switch between the two. Are you suggesting that we check safemode in both, or we combine the blocks into one? I assumed the intent was to avoid doing the potentially CPU-heavy chooseTarget work while synchronized. bq. renewLease() shouldn't be under FSNamesystem lock? leaseManeger has its own lock This is to prevent safemode from switching while calling renewLease. If we decided that renewing a lease under safemode is not allowed, then we need to synchronize here. Otherwise the check is prone to races. Regarding deadlock potential, I think we're safe since LeaseManager.Monitor synchronizes on FSNamesystem before synchronizing on the lease manager. bq. Your changes to permission methods incorporate HDFS-133. Resolved that one as dup, thanks. > saveNamespace can corrupt edits log > ----------------------------------- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.21.0, 0.22.0 > Reporter: dhruba borthakur > Assignee: Todd Lipcon > Attachments: hdfs-988.txt, saveNamespace.txt > > > The adminstrator puts the namenode is safemode and then issues the savenamespace command. This can corrupt the edits log. The problem is that when the NN enters safemode, there could still be pending logSycs occuring from other threads. Now, the saveNamespace command, when executed, would save a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.