hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2290) Block with corrupt replica is not getting replicated
Date Fri, 26 Aug 2011 23:39:29 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092134#comment-13092134

Konstantin Shvachko commented on HDFS-2290:

Two things.
# Setting replication to 2 for the corrupt file on 3-node cluster in a hope that the corrupt
replica will be removed and I'll set replication back to 3. fsck shows healthy file, but NN
does not even try to delete the corrupt replica. The DN keeps reporting the corrupt replica,
and when I set replication back to 3, I end up where I started. 
The general problem seems to be that NN does not schedule deletion of corrupt replicas when
you lower the replication of the block.
# Starting 4th DN. Replication is triggered as expected, and then removal of the corrupt replica
is scheduled, but the latter fails with the following exception:
11/08/26 16:19:38 WARN datanode.DataNode: Unexpected error trying to delete block blk_-4767793772698703708_1816.
BlockInfo not found in volumeMap.
11/08/26 16:19:38 WARN datanode.DataNode: Error processing datanode Command
java.io.IOException: Error in deleting blocks.
	at org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:1681)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:1021)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:983)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:920)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1439)
	at java.lang.Thread.run(Unknown Source){code}
I think DN can ignore the absence of metadata file if it is deleting it anyways.

> Block with corrupt replica is not getting replicated
> ----------------------------------------------------
>                 Key: HDFS-2290
>                 URL: https://issues.apache.org/jira/browse/HDFS-2290
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Konstantin Shvachko
>             Fix For: 0.22.0
> A block has one replica marked as corrupt and two good ones. countNodes() correctly detects
that there are only 2 live replicas, and fsck reports the block as under-replicated. But ReplicationMonitor
never schedules replication of good replicas.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message