hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6791) A block could remain under replicated if all of its replicas are on decommissioned nodes
Date Thu, 31 Jul 2014 06:53:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080580#comment-14080580
] 

Ming Ma commented on HDFS-6791:
-------------------------------

Use the following steps to repro:

1. Create a situation where node A has several blocks with replication factor equal to 1.
2. Start to decommission node A. Right after the decommission process starts, kill the DN
JVM on node A.
3. Wait until NN marks node A dead. After that, NN will mark the node as decommissioned. That
is because when there is no block left for the DN, decommission is considered done. Given
node A hasn't finished copying its blocks, there will be missing blocks at this point.

{noformat}
BlockManager.java
  boolean isReplicationInProgress(DatanodeDescriptor srcNode) {
    boolean status = false;
...
    final Iterator<? extends Block> it = srcNode.getBlockIterator();
    while(it.hasNext()) {
...
// set status if there is block under replication
    }
...
    return status;
}
{noformat}

4. Restart the node A. Upon datanode registration, given the node is already in decommissioned
state, no decommission is performed. So node A will be in decommissioned state and its blocks
aren't copied to other nodes.


Some ideas on how to fix it,

1. When a DN becomes dead during decommission, NN can continue to mark the DN "decommission-in-progress".
That will allow the DN to resume the decommission process when it rejoins the cluster.

2. Another approach could be to relax the definition of "decommissioned" state so that when
BlockManager choose source datanode for replication, it could choose "decommissioned" under
special condition, e.g., there is no other datanode available.

Suggestions?


> A block could remain under replicated if all of its replicas are on decommissioned nodes
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-6791
>                 URL: https://issues.apache.org/jira/browse/HDFS-6791
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ming Ma
>
> Here is the scenario.
> 1. Normally before NN transitions a DN to decommissioned state, enough replicas have
been copied to other "in service" DNs. However, in some rare situations, the cluster got into
a state where a DN is in decommissioned state and a block's only replica is on that DN. In
such state, the number of replication reported by fsck is 1; the block just stays in under
replicated state; applications can still read the data, given decommissioned node can served
read traffic.
> This can happen in some error situations such DN failure or NN failover. For example
> a) a block's only replica is node A temporarily.
> b) Start decommission process on node A.
> c) When node A is in "decommission-in-progress" state, node A crashed. NN will mark node
A as dead.
> d) After node A rejoins the cluster, NN will mark node A as decommissioned. 
> 2. In theory, NN should take care of under replicated blocks. But it doesn't for this
special case where the only replica is on decommissioned node. That is because NN has the
policy of "decommissioned node can't be picked the source node for replication".
> {noformat}
> BlockManager.java
> chooseSourceDatanode
>       // never use already decommissioned nodes
>       if(node.isDecommissioned())
>         continue;
> {noformat}
> 3. Given NN marks the node as decommissioned, admins will shutdown the datanode. Under
replicated blocks turn into missing blocks.
> 4. The workaround is to recommission the node so that NN can start the replication from
the node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message