Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 77548 invoked from network); 10 Mar 2009 17:33:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Mar 2009 17:33:15 -0000 Received: (qmail 37190 invoked by uid 500); 10 Mar 2009 17:33:11 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 37154 invoked by uid 500); 10 Mar 2009 17:33:11 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 37143 invoked by uid 99); 10 Mar 2009 17:33:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Mar 2009 10:33:11 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Mar 2009 17:33:10 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 75085234C045 for ; Tue, 10 Mar 2009 10:32:50 -0700 (PDT) Message-ID: <359721088.1236706370478.JavaMail.jira@brutus> Date: Tue, 10 Mar 2009 10:32:50 -0700 (PDT) From: "Konstantin Shvachko (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-5453) Could FSEditLog report problems more elegantly than with System.exit(-1) In-Reply-To: <103185134.1236694970560.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680545#action_12680545 ] Konstantin Shvachko commented on HADOOP-5453: --------------------------------------------- FSEditLog calls {{System.exit(-1)} when there are no more edit streams to write the name-space modifications to. No streams means the name-space state is not persistent anymore and may not be current when the name-node restarts. So this is not about reporting problems but rather about the consistency of the system. Namely, if the system cannot persist changes it dies. Though I agree dying might not be the most elegant solution. Now since we have "saveNamespace" command the loss of all edit streams can be treated as just switching to safe mode. When local disks are restored the administrator can save the namespace. Alternatively a secondary node can be started to perform an emergency checkpoint. > Could FSEditLog report problems more elegantly than with System.exit(-1) > ------------------------------------------------------------------------ > > Key: HADOOP-5453 > URL: https://issues.apache.org/jira/browse/HADOOP-5453 > Project: Hadoop Core > Issue Type: Improvement > Components: dfs > Affects Versions: 0.21.0 > Reporter: Steve Loughran > Priority: Minor > > When FSEdit encounters problems, it prints something and then exits. > It would be better for any in-JVM deployments of FSEdit for these to be raised in some other way (such as throwing an exception), rather than taking down the whole JVM. That could be in JUnit tests, or it could be inside other applications. Test runners and the like can intercept those System.exit() calls with their own Security Manager -often turning the System.exit() operation into an exception there and then. If FSEdit did that itself, it may be easier to stay in control. > The current approach has some benefits -it can exit regardless of which thread has encountered problems, but it is tricky to test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.