Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C9C1D78E0 for ; Mon, 5 Dec 2011 19:29:01 +0000 (UTC) Received: (qmail 37511 invoked by uid 500); 5 Dec 2011 19:29:01 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 37369 invoked by uid 500); 5 Dec 2011 19:29:01 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 37355 invoked by uid 99); 5 Dec 2011 19:29:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Dec 2011 19:29:01 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Dec 2011 19:28:59 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id E8D5FC9B45 for ; Mon, 5 Dec 2011 19:28:39 +0000 (UTC) Date: Mon, 5 Dec 2011 19:28:39 +0000 (UTC) From: "Dan Bradley (Created) (JIRA)" To: hdfs-dev@hadoop.apache.org Message-ID: <1876438457.42052.1323113319955.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (HDFS-2632) existing in_use.lock file is removed after failing to lock this file MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 existing in_use.lock file is removed after failing to lock this file -------------------------------------------------------------------- Key: HDFS-2632 URL: https://issues.apache.org/jira/browse/HDFS-2632 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.21.0 Environment: Scientific Linux 5.3 Reporter: Dan Bradley If an attempt is made to start the namenode when it is already running, an exception is generated on failure to lock in_use.lock. However, there is a bug: in_use.lock is deleted! After that, if another attempt is made to start the namenode, there is no in_use.lock file, so the new instance goes ahead and starts messing with the namenode state files. It eventually fails to bind to the TCP port, but it has already done damage by that time. Specifically, the 'edits' file being written to by the running instance is moved to 'previous.checkpoint' so all further transactions are lost when the HDFS service is next restarted. We observed a case of data loss because of this. This issue relates to HDFS-1690, but the problem in HDFS-1690 was stated in a way that is specific to -format. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira