Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 30FBC6FAE for ; Wed, 18 May 2011 14:32:31 +0000 (UTC) Received: (qmail 1506 invoked by uid 500); 18 May 2011 14:32:31 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 1469 invoked by uid 500); 18 May 2011 14:32:31 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 1461 invoked by uid 99); 18 May 2011 14:32:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2011 14:32:30 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2011 14:32:28 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 2D8C3CFF71 for ; Wed, 18 May 2011 14:31:48 +0000 (UTC) Date: Wed, 18 May 2011 14:31:48 +0000 (UTC) From: "ramkrishna.s.vasudevan (JIRA)" To: common-issues@hadoop.apache.org Message-ID: <528970851.22443.1305729108182.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HADOOP-5342) DataNodes do not start up because InconsistentFSStateException on just part of the disks in use MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035405#comment-13035405 ] ramkrishna.s.vasudevan commented on HADOOP-5342: ------------------------------------------------ I would like to suggest Pls correct me if am wrong namespace id is getting updated immediately after one of the disks of the dfs.data.dir got updated. instead update the namespace id after parsing all the dfs.data.dir storage directories > DataNodes do not start up because InconsistentFSStateException on just part of the disks in use > ----------------------------------------------------------------------------------------------- > > Key: HADOOP-5342 > URL: https://issues.apache.org/jira/browse/HADOOP-5342 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 0.18.2 > Reporter: Christian Kunz > Assignee: Hairong Kuang > Priority: Critical > > After restarting a cluster (including rebooting) the dfs got corrupted because many DataNodes did not start up, running into the following exception: > 2009-02-26 22:33:53,774 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory xxx is in an inconsistent state: version file in current directory is missing. > at org.apache.hadoop.dfs.Storage$StorageDirectory.analyzeStorage(Storage.java:326) > at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:105) > at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:306) > at org.apache.hadoop.dfs.DataNode.(DataNode.java:223) > at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:3030) > at org.apache.hadoop.dfs.DataNode.instantiateDataNode(DataNode.java:2985) > at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2993) > at org.apache.hadoop.dfs.DataNode.main(DataNode.java:3115) > This happens when using multiple disks with at least one previously marked as read-only, such that the storage version became out-dated, but after reboot it was mounted read-write, resulting in the DataNode not starting because of out-dated version. > This is a big headache. If a DataNode has multiple disks of which at least one has the correct storage version then out-dated versions should not bring down the DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira