Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2CC673370 for ; Wed, 4 May 2011 16:28:43 +0000 (UTC) Received: (qmail 78987 invoked by uid 500); 4 May 2011 16:28:42 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 78932 invoked by uid 500); 4 May 2011 16:28:42 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 78727 invoked by uid 99); 4 May 2011 16:28:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 May 2011 16:28:42 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 May 2011 16:28:41 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 55168C1338 for ; Wed, 4 May 2011 16:28:03 +0000 (UTC) Date: Wed, 4 May 2011 16:28:03 +0000 (UTC) From: "sravankorumilli (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <568813353.21980.1304526483345.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1148087811.21961.1304526123120.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-1887) If DataNode gets killed after 'data.dir' is created, but before LAYOUTVERSION is written to the storage file. The further restarts of the DataNode, an EOFException will be thrown while reading the storage file. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028811#comment-13028811 ] sravankorumilli commented on HDFS-1887: --------------------------------------- As I know this problem may come while datanode is getting formatted or by corruption of storage file manually or through some program.Can anyone comment on any of my proposed solutions. 1.Creating a new storage file when an EOFException is thrown in DataStorage.isConversionNeeded method while reading LAYOUT_VERSION from the storage file. OR 2.Sending the storage state as NOT_FORMATTED when this problem comes but this will be a problem if it is corrupted manually or through some program then this solution will not be appropriate as this will format the datadir. > If DataNode gets killed after 'data.dir' is created, but before LAYOUTVERSION is written to the storage file. The further restarts of the DataNode, an EOFException will be thrown while reading the storage file. > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-1887 > URL: https://issues.apache.org/jira/browse/HDFS-1887 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node > Affects Versions: 0.20.1, 0.21.0, 0.23.0 > Environment: Linux > Reporter: sravankorumilli > Priority: Minor > > Assume DataNode gets killed after 'data.dir' is created, but before LAYOUTVERSION is written to the storage file. The further restarts of the DataNode, an EOFException will be thrown while reading the storage file. The DataNode cannot be restarted successfully until the 'data.dir' is deleted. > These are the corresponding logs:- > 2011-05-02 19:12:19,389 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.EOFException > at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725) > at org.apache.hadoop.hdfs.server.datanode.DataStorage.isConversionNeeded(DataStorage.java:203) > at org.apache.hadoop.hdfs.server.common.Storage.checkConversionNeeded(Storage.java:697) > at org.apache.hadoop.hdfs.server.common.Storage.access$000(Storage.java:62) > at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:476) > at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:116) > at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:336) > at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:260) > at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:237) > at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1440) > at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1393) > at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1407) > at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1552) > Our Hadoop cluster is managed by a cluster management software which tries to eliminate any manual intervention in setting up & managing the cluster. But in the above mentioned scenario, it requires manual intervention to recover the DataNode. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira