hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7235) DataNode#transferBlock should report blocks that don't exist using reportBadBlock
Date Wed, 29 Oct 2014 13:35:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188335#comment-14188335

Hudson commented on HDFS-7235:

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/])
HDFS-7235. DataNode#transferBlock should report blocks that don't exist using reportBadBlock
(yzhang via cmccabe) (cmccabe: rev ac9ab037e9a9b03e4fa9bd471d3ab9940beb53fb)
* hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReplication.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/UnexpectedReplicaStateException.java
* hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

> DataNode#transferBlock should report blocks that don't exist using reportBadBlock
> ---------------------------------------------------------------------------------
>                 Key: HDFS-7235
>                 URL: https://issues.apache.org/jira/browse/HDFS-7235
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>    Affects Versions: 2.6.0
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>             Fix For: 2.7.0
>         Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch, HDFS-7235.004.patch,
HDFS-7235.005.patch, HDFS-7235.006.patch, HDFS-7235.007.patch, HDFS-7235.007.patch
> When to decommission a DN, the process hangs. 
> What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned
DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer
(see BlockManager.java).  However, because of the bad disk, the DN would detect the source
block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java:
> {code}
> /** Does the block exist and have the given state? */
>   private boolean isValid(final ExtendedBlock b, final ReplicaState state) {
>     final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), 
>         b.getLocalBlock());
>     return replicaInfo != null
>         && replicaInfo.getState() == state
>         && replicaInfo.getBlockFile().exists();
>   }
> {code}
> The reason that this method returns false (detecting invalid block) is because the block
file doesn't exist due to bad disk in this case. 
> The key issue we found here is, after DN detects an invalid block for the above reason,
it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted,
and keeps sending the data transfer request to the same DN to be decommissioned, again and
again. This caused an infinite loop, so the decommission process hangs.
> Thanks [~qwertymaniac] for reporting the issue and initial analysis.

This message was sent by Atlassian JIRA

View raw message