hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sravankorumilli (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-1887) If DataNode gets killed before LAYOUTVERSION is being written to the storage file. The further restarts of the DataNode will not succeed an EOFException will be thrown at restart.
Date Thu, 05 May 2011 15:23:03 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

sravankorumilli updated HDFS-1887:
----------------------------------

    Description: 
Assume DataNode gets killed before LAYOUTVERSION is being written to the storage file. Then
in further restarts of the DataNode, an EOFException will be thrown while reading the storage
file. The DataNode cannot be restarted successfully until the storage file is deleted and
restarted once again.

These are the corresponding logs:-
2011-05-02 19:12:19,389 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.EOFException
at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.isConversionNeeded(DataStorage.java:203)
at org.apache.hadoop.hdfs.server.common.Storage.checkConversionNeeded(Storage.java:697)
at org.apache.hadoop.hdfs.server.common.Storage.access$000(Storage.java:62)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:476)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:116)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:336)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:260)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:237)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1440)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1393)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1407)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1552)

Our Hadoop cluster is managed by a cluster management software which tries to eliminate any
manual intervention in setting up & managing the cluster. But in the above mentioned scenario,
it requires manual intervention to recover the DataNode.Though it is very rare there is a
possibility for this

  was:
Assume DataNode gets killed after 'data.dir' is created, but before LAYOUTVERSION is written
to the storage file. The further restarts of the DataNode, an EOFException will be thrown
while reading the storage file. The DataNode cannot be restarted successfully until the 'data.dir'
is deleted.

These are the corresponding logs:-
2011-05-02 19:12:19,389 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.EOFException
at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.isConversionNeeded(DataStorage.java:203)
at org.apache.hadoop.hdfs.server.common.Storage.checkConversionNeeded(Storage.java:697)
at org.apache.hadoop.hdfs.server.common.Storage.access$000(Storage.java:62)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:476)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:116)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:336)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:260)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:237)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1440)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1393)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1407)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1552)

Our Hadoop cluster is managed by a cluster management software which tries to eliminate any
manual intervention in setting up & managing the cluster. But in the above mentioned scenario,
it requires manual intervention to recover the DataNode.

       Priority: Major  (was: Minor)
        Summary: If DataNode gets killed before LAYOUTVERSION is being written to the storage
file. The further restarts of the DataNode will not succeed an EOFException will be thrown
at restart.  (was: If DataNode gets killed after 'data.dir' is created, but before LAYOUTVERSION
is written to the storage file. The further restarts of the DataNode, an EOFException will
be thrown while reading the storage file. )

> If DataNode gets killed before LAYOUTVERSION is being written to the storage file. The
further restarts of the DataNode will not succeed an EOFException will be thrown at restart.
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1887
>                 URL: https://issues.apache.org/jira/browse/HDFS-1887
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.20.1, 0.21.0, 0.23.0
>         Environment: Linux
>            Reporter: sravankorumilli
>
> Assume DataNode gets killed before LAYOUTVERSION is being written to the storage file.
Then in further restarts of the DataNode, an EOFException will be thrown while reading the
storage file. The DataNode cannot be restarted successfully until the storage file is deleted
and restarted once again.
> These are the corresponding logs:-
> 2011-05-02 19:12:19,389 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.EOFException
> at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
> at org.apache.hadoop.hdfs.server.datanode.DataStorage.isConversionNeeded(DataStorage.java:203)
> at org.apache.hadoop.hdfs.server.common.Storage.checkConversionNeeded(Storage.java:697)
> at org.apache.hadoop.hdfs.server.common.Storage.access$000(Storage.java:62)
> at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:476)
> at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:116)
> at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:336)
> at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:260)
> at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:237)
> at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1440)
> at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1393)
> at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1407)
> at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1552)
> Our Hadoop cluster is managed by a cluster management software which tries to eliminate
any manual intervention in setting up & managing the cluster. But in the above mentioned
scenario, it requires manual intervention to recover the DataNode.Though it is very rare there
is a possibility for this

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message