hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Yang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-9600) do not check replication if the block is under construction
Date Fri, 25 Dec 2015 06:58:49 GMT
Phil Yang created HDFS-9600:

             Summary: do not check replication if the block is under construction
                 Key: HDFS-9600
                 URL: https://issues.apache.org/jira/browse/HDFS-9600
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Phil Yang
            Assignee: Phil Yang
            Priority: Critical

When appending a file, we will update pipeline to bump a new GS and the old GS will be considered
as out of date. When changing GS, in BlockInfo.setGenerationStampAndVerifyReplicas we will
remove replicas having old GS which means we will remove all replicas because no DN has new
GS until the block with new GS is added to blockMaps again by DatanodeProtocol.blockReceivedAndDeleted.

If we check replication of this block before it is added back, it will be regarded as missing.
The probability is low but if there are decommissioning nodes the DecommissionManager.Monitor
will scan all blocks belongs to decommissioning nodes with a very fast speed so the probability
of finding missing block is very high and actually they are not missing. 

Furthermore, after closing the appended file, in FSNamesystem.finalizeINodeFileUnderConstruction,
it will checkReplication and because of some of nodes is decommissioning, this block with
new GS will be added to UnderReplicatedBlocks map so there are two blocks with same ID in
this map, one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in QUEUE_HIGHEST_PRIORITY or
QUEUE_UNDER_REPLICATED. And there will be many missing blocks warning in NameNode website
but there is no corrupt files...

Therefore, I think the solution is we should not check replication if the block is under construction.
We only check complete blocks.

This message was sent by Atlassian JIRA

View raw message