Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 75180 invoked from network); 3 Dec 2010 08:17:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Dec 2010 08:17:37 -0000 Received: (qmail 94637 invoked by uid 500); 3 Dec 2010 08:17:37 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 94534 invoked by uid 500); 3 Dec 2010 08:17:37 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 94522 invoked by uid 99); 3 Dec 2010 08:17:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Dec 2010 08:17:36 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Dec 2010 08:17:33 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oB38HBnb022178 for ; Fri, 3 Dec 2010 08:17:12 GMT Message-ID: <9513755.90331291364231819.JavaMail.jira@thor> Date: Fri, 3 Dec 2010 03:17:11 -0500 (EST) From: "Konstantin Shvachko (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-1508) Ability to do savenamespace without being in safemode In-Reply-To: <8808900.184241290122113827.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966448#action_12966448 ] Konstantin Shvachko commented on HDFS-1508: ------------------------------------------- > I am unable to write a unit test that would trigger all or any of the races This is exactly my point. There is a whole chess game going on underneath with moving files/directories and threads writing in parallel. Changing the position of one pawn can change the outcome of the game. If saveNamespace() succeeds we are lucky and checkpoint fails. If not then somebody has to clean up the mess and there is lots of failure scenarios. We with Todd once spent quite some time sorting out all of them. May be I am paranoid and your change doesn't change the game, but it needs some convincing argumentation, which is hard. That is why I was asking alternatively about the use case. I understand setting NN in safe mode causes jobs failure. But why do you need to call saveNamespace()? What is wrong with checkpointing? > Ability to do savenamespace without being in safemode > ----------------------------------------------------- > > Key: HDFS-1508 > URL: https://issues.apache.org/jira/browse/HDFS-1508 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: savenamespaceWithoutSafemode.txt, savenamespaceWithoutSafemode2.txt, savenamespaceWithoutSafemode3.txt > > > In the current code, the administrator can run savenamespace only after putting the namenode in safemode. This means that applications that are writing to HDFS encounters errors because the NN is in safemode. We would like to allow saveNamespace even when the namenode is not in safemode. > The savenamespace command already acquires the FSNamesystem writelock. There is no need to require that the namenode is in safemode too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.