hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11960) Successfully closed files can stay under-replicated.
Date Fri, 09 Jun 2017 17:46:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044755#comment-16044755
] 

Kihwal Lee commented on HDFS-11960:
-----------------------------------

Details of the step 6).
{{processIncrementalBlockReport()}} calls {{addBlock()}} for the received IBR with the old
gen stamp. {{addBlock()}} unconditionally decrements pending count for the block.
{code:java}
  void addBlock(DatanodeStorageInfo storageInfo, Block block, String delHint)
      throws IOException {
...
    //
    // Modify the blocks->datanode map and node's map.
    //
    pendingReplications.decrement(block, node);
    processAndHandleReportedBlock(storageInfo, block, ReplicaState.FINALIZED,
        delHintNode);
  }
{code}

In {{processAndHandleReportedBlock()}}, the replica is identified as corrupt, so {{markBlockAsCorrupt()}}
is called.

{code}
  private void markBlockAsCorrupt(BlockToMarkCorrupt b,
      DatanodeStorageInfo storageInfo,
      DatanodeDescriptor node) throws IOException {
...
    boolean corruptedDuringWrite = minReplicationSatisfied &&
        (b.stored.getGenerationStamp() > b.corrupted.getGenerationStamp());
    // case 1: have enough number of live replicas
    // case 2: corrupted replicas + live replicas > Replication factor
    // case 3: Block is marked corrupt due to failure while writing. In this
    //         case genstamp will be different than that of valid block.
    // In all these cases we can delete the replica.
    // In case of 3, rbw block will be deleted and valid block can be replicated
    if (hasEnoughLiveReplicas || hasMoreCorruptReplicas
        || corruptedDuringWrite) {
      // the block is over-replicated so invalidate the replicas immediately
      invalidateBlock(b, node);
    } else if (namesystem.isPopulatingReplQueues()) {
      // add the block to neededReplication
      updateNeededReplications(b.stored, -1, 0);
    }
  }
{code}

As shown above, it is considered as "case 3", which causes immediate invalidation of the corrupt
block. No further check on replication is done.

> Successfully closed files can stay under-replicated.
> ----------------------------------------------------
>
>                 Key: HDFS-11960
>                 URL: https://issues.apache.org/jira/browse/HDFS-11960
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>
> If a certain set of conditions hold at the time of a file creation, a block of the file
can stay under-replicated.  This is because the block is mistakenly taken out of the under-replicated
block queue and never gets reevaluated.
> Re-evaluation can be triggered if
> - a replica containing node dies.
> - setrep is called
> - NN repl queues are reinitialized (NN failover or restart)
> If none of these happens, the block stays under-replicated. 
> Here is how it happens.
> 1) A replica is finalized, but the ACK does not reach the upstream in time. IBR is also
delayed.
> 2) A close recovery happens, which updates the gen stamp of "healthy" replicas.
> 3) The file is closed with the healthy replicas. It is added to the replication queue.
> 4) A replication is scheduled, so it is added to the pending replication list. The replication
target is picked as the failed node in 1).
> 5) The old IBR is finally received for the failed/excluded node. In the meantime, the
replication fails, because there is already a finalized replica (with older gen stamp) on
the node.
> 6) The IBR processing removes the block from the pending list, adds it to corrupt replicas
list, and then issues invalidation. Since the block is in neither replication queue nor pending
list, it stays under-replicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message