hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Roberts (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-11755) Underconstruction blocks can be considered missing
Date Thu, 04 May 2017 17:34:04 GMT
Nathan Roberts created HDFS-11755:

             Summary: Underconstruction blocks can be considered missing
                 Key: HDFS-11755
                 URL: https://issues.apache.org/jira/browse/HDFS-11755
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 3.0.0-alpha2, 2.8.1
            Reporter: Nathan Roberts
            Assignee: Nathan Roberts

Following sequence of events can lead to a block underconstruction being considered missing.

- pipeline of 3 DNs, DN1->DN2->DN3
- DN3 has a failing disk so some updates take a long time
- Client writes entire block and is waiting for final ack
- DN1, DN2 and DN3 have all received the block 
- DN1 is waiting for ACK from DN2 who is waiting for ACK from DN3
- DN3 is having trouble finalizing the block due to the failing drive. It does eventually
succeed but it is VERY slow at doing so. 
- DN2 times out waiting for DN3 and tears down its pieces of the pipeline, so DN1 notices
and does the same. Neither DN1 nor DN2 finalized the block.
- DN3 finally sends an IBR to the NN indicating the block has been received.
- Drive containing the block on DN3 fails enough that the DN takes it offline and notifies
NN of failed volume
- NN removes DN3's replica from the triplets and then declares the block missing because there
are no other replicas

Seems like we shouldn't consider uncompleted blocks for replication.  

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

View raw message