hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9600) do not check replication if the block is under construction
Date Sat, 06 Feb 2016 18:38:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135936#comment-15135936
] 

Yongjun Zhang commented on HDFS-9600:
-------------------------------------

Thanks [~yangzhe1991]!

I studied the code further and have the following thoughts:

{code}
 /**
   * Return true if there are any blocks on this node that have not
   * yet reached their replication factor. Otherwise returns false.
   */
  boolean isReplicationInProgress(DatanodeDescriptor srcNode) {
{code}
As you said, this method is used by DatanodeManager.checkDecommissionState, if all blocks
hold by a decommissioning node are not "isReplicationInProgress", then it will be set decommissioned.
The current criteria is:
{code}
        if (curReplicas < curExpectedReplicas
            || !isPlacementPolicySatisfied(block)) {
{code}
We do not check whether the block is complete or not, because the most concern is whether
we have enough replicas. However, I found couple of  things here:

1. If the block is not complete, especially if the block is being written to right now, we
seem to be able to still decommission this node (isReplicationInProgress returns false). That
may be ok for replication factor bigger than 1 (let the remaining replicas to carry on the
ongoing write), but if it's 1, then we would lost the replica, and the block. Isn't that a
problem? 

2. If the block is complete, and the replication factor is 1, similarly, the isReplicationInProgress
method will return false and we are still able to decommission the node. 

So seems that we need to handle replication factor 1 with extra care. The solution would be
to over-replicate first, then decommission the node. I am not sure whether the de-commissioner
handles that.

Thoughts?

Thanks.


> do not check replication if the block is under construction
> -----------------------------------------------------------
>
>                 Key: HDFS-9600
>                 URL: https://issues.apache.org/jira/browse/HDFS-9600
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>            Priority: Critical
>             Fix For: 2.8.0, 2.7.3, 2.6.4
>
>         Attachments: HDFS-9600-branch-2.6.patch, HDFS-9600-branch-2.7.patch, HDFS-9600-branch-2.patch,
HDFS-9600-v1.patch, HDFS-9600-v2.patch, HDFS-9600-v3.patch, HDFS-9600-v4.patch
>
>
> When appending a file, we will update pipeline to bump a new GS and the old GS will be
considered as out of date. When changing GS, in BlockInfo.setGenerationStampAndVerifyReplicas
we will remove replicas having old GS which means we will remove all replicas because no DN
has new GS until the block with new GS is added to blockMaps again by DatanodeProtocol.blockReceivedAndDeleted.
> If we check replication of this block before it is added back, it will be regarded as
missing. The probability is low but if there are decommissioning nodes the DecommissionManager.Monitor
will scan all blocks belongs to decommissioning nodes with a very fast speed so the probability
of finding missing block is very high but actually they are not missing. 
> Furthermore, after closing the appended file, in FSNamesystem.finalizeINodeFileUnderConstruction,
it will checkReplication. If some of nodes are decommissioning, this block with new GS will
be added to UnderReplicatedBlocks map so there are two blocks with same ID in this map, one
is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED.
And there will be many missing blocks warning in NameNode website but there is no corrupt
files...
> Therefore, I think the solution is we should not check replication if the block is under
construction. We only check complete blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message