Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 57D986C66 for ; Thu, 26 May 2011 07:30:54 +0000 (UTC) Received: (qmail 1459 invoked by uid 500); 26 May 2011 07:30:54 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 1306 invoked by uid 500); 26 May 2011 07:30:38 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 1181 invoked by uid 99); 26 May 2011 07:30:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 May 2011 07:30:32 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 May 2011 07:30:28 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 790B1DF8CA for ; Thu, 26 May 2011 07:29:47 +0000 (UTC) Date: Thu, 26 May 2011 07:29:47 +0000 (UTC) From: "sravankorumilli (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <637512827.44597.1306394987492.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1148087811.21961.1304526123120.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HDFS-1887) Facing problems while restarting the datanode if the datanode format is unsuccessful. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sravankorumilli updated HDFS-1887: ---------------------------------- Description: In the existing behavior we are checking whether datanode is formatted or not based on the version file existence. If version file is not present the storage directory will be formatted. In some cases if formatting got terminated abruptly there can be a scenario where storage file or version file will be created and the content may not be written. In such scenarios when Datanode is restarted it is just throwing an exception. Some one has to manually delete the storage directory and restart the datanode. This is one scenario where storage file is created but the content is not written then I am getting this exception. These are the corresponding logs:- 2011-05-02 19:12:19,389 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.EOFException at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725) at org.apache.hadoop.hdfs.server.datanode.DataStorage.isConversionNeeded(DataStorage.java:203) at org.apache.hadoop.hdfs.server.common.Storage.checkConversionNeeded(Storage.java:697) at org.apache.hadoop.hdfs.server.common.Storage.access$000(Storage.java:62) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:476) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:116) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:336) at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:260) at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:237) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1440) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1393) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1407) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1552) Our Hadoop cluster is managed by a cluster management software which tries to eliminate any manual intervention in setting up & managing the cluster. But in the above mentioned scenario, it requires manual intervention to recover the DataNode.Though it is very rare there is a possibility for this was: Assume DataNode gets killed before LAYOUTVERSION is being written to the storage file. Then in further restarts of the DataNode, an EOFException will be thrown while reading the storage file. The DataNode cannot be restarted successfully until the storage file is deleted and restarted once again. These are the corresponding logs:- 2011-05-02 19:12:19,389 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.EOFException at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725) at org.apache.hadoop.hdfs.server.datanode.DataStorage.isConversionNeeded(DataStorage.java:203) at org.apache.hadoop.hdfs.server.common.Storage.checkConversionNeeded(Storage.java:697) at org.apache.hadoop.hdfs.server.common.Storage.access$000(Storage.java:62) at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:476) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:116) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:336) at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:260) at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:237) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1440) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1393) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1407) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1552) Our Hadoop cluster is managed by a cluster management software which tries to eliminate any manual intervention in setting up & managing the cluster. But in the above mentioned scenario, it requires manual intervention to recover the DataNode.Though it is very rare there is a possibility for this Summary: Facing problems while restarting the datanode if the datanode format is unsuccessful. (was: Datanode is not starting. If DataNode gets killed before LAYOUTVERSION is being written to the storage file. .An EOFException will be thrown at the time of restart.) > Facing problems while restarting the datanode if the datanode format is unsuccessful. > ------------------------------------------------------------------------------------- > > Key: HDFS-1887 > URL: https://issues.apache.org/jira/browse/HDFS-1887 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node > Affects Versions: 0.20.1, 0.21.0, 0.23.0 > Environment: Linux > Reporter: sravankorumilli > > In the existing behavior we are checking whether datanode is formatted or not based on the version file existence. If version file is not present the storage directory will be formatted. In some cases if formatting got terminated abruptly there can be a scenario where storage file or version file will be created and the content may not be written. In such scenarios when Datanode is restarted it is just throwing an exception. Some one has to manually delete the storage directory and restart the datanode. > This is one scenario where storage file is created but the content is not written then I am getting this exception. > These are the corresponding logs:- > 2011-05-02 19:12:19,389 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.EOFException > at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725) > at org.apache.hadoop.hdfs.server.datanode.DataStorage.isConversionNeeded(DataStorage.java:203) > at org.apache.hadoop.hdfs.server.common.Storage.checkConversionNeeded(Storage.java:697) > at org.apache.hadoop.hdfs.server.common.Storage.access$000(Storage.java:62) > at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:476) > at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:116) > at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:336) > at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:260) > at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:237) > at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1440) > at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1393) > at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1407) > at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1552) > Our Hadoop cluster is managed by a cluster management software which tries to eliminate any manual intervention in setting up & managing the cluster. But in the above mentioned scenario, it requires manual intervention to recover the DataNode.Though it is very rare there is a possibility for this -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira