hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9600) do not check replication if the block is under construction
Date Sun, 07 Feb 2016 23:54:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136526#comment-15136526

Yongjun Zhang commented on HDFS-9600:

Thanks for looking into and commenting Andrew!

I agree that how the status is set is confusing here.

For open files, we only let it decom if the UC block stays above minReplication (default 1).
Note the curReplicas > minReplication check.

The {{isReplicationInProgress}} method initialize {{status}} to false, for replicationFactor==1
case, the check {{curReplicas > minReplication}} will be false, thus the method will return
false, and the node that contains the single replica will be marked as decommissioned, thus
the block will be lost , unless we over replicate all replicas on the node before decommissioning
for replicationFactor==1 case. Am I right? Did we do over-replicate before decommissioning?

One point in my earlier comment is, If a replica is currently being written to (under construction),
in order for us to decommission this node, either 1, the client need to reconstruct the write
pipeline without this node, or 2, we need to wait for the block to be complete, and satisfy
the replication count before this node can be decommissioned. Since it may take long to finish
writing a block, I think likely it's case 1. If we go with case 1, then for replicationFactor==1
case, the client need to rewrite all data of the same block to other node. I wonder if  this
is the current behavior?

Hope other folks who are more familiar with the code can answer your question. It'd be nice
to have some good document about the handling.


> do not check replication if the block is under construction
> -----------------------------------------------------------
>                 Key: HDFS-9600
>                 URL: https://issues.apache.org/jira/browse/HDFS-9600
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>            Priority: Critical
>             Fix For: 2.8.0, 2.7.3, 2.6.4
>         Attachments: HDFS-9600-branch-2.6.patch, HDFS-9600-branch-2.7.patch, HDFS-9600-branch-2.patch,
HDFS-9600-v1.patch, HDFS-9600-v2.patch, HDFS-9600-v3.patch, HDFS-9600-v4.patch
> When appending a file, we will update pipeline to bump a new GS and the old GS will be
considered as out of date. When changing GS, in BlockInfo.setGenerationStampAndVerifyReplicas
we will remove replicas having old GS which means we will remove all replicas because no DN
has new GS until the block with new GS is added to blockMaps again by DatanodeProtocol.blockReceivedAndDeleted.
> If we check replication of this block before it is added back, it will be regarded as
missing. The probability is low but if there are decommissioning nodes the DecommissionManager.Monitor
will scan all blocks belongs to decommissioning nodes with a very fast speed so the probability
of finding missing block is very high but actually they are not missing. 
> Furthermore, after closing the appended file, in FSNamesystem.finalizeINodeFileUnderConstruction,
it will checkReplication. If some of nodes are decommissioning, this block with new GS will
be added to UnderReplicatedBlocks map so there are two blocks with same ID in this map, one
And there will be many missing blocks warning in NameNode website but there is no corrupt
> Therefore, I think the solution is we should not check replication if the block is under
construction. We only check complete blocks.

This message was sent by Atlassian JIRA

View raw message