hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2791) If block report races with closing of file, replica is incorrectly marked corrupt
Date Fri, 27 Jan 2012 20:50:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195086#comment-13195086
] 

Todd Lipcon commented on HDFS-2791:
-----------------------------------

bq.  I am coming to the conclusion that when a NN asks a DN to delete a replica, in addition
to the bid and generation stamp, it should also include the state (RBW etc) known to the NN.
The block is deleted only if the it is in that state.

Good idea - I like this safeguard. But given that there are +1s on this patch here, I dont
think the above safeguard is mutually exclusive either. So let's do both for extra safety.

Assuming this patch still applies, I'll commit it momentarily.
                
> If block report races with closing of file, replica is incorrectly marked corrupt
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-2791
>                 URL: https://issues.apache.org/jira/browse/HDFS-2791
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, name-node
>    Affects Versions: 0.22.0, 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-2791-test.txt, hdfs-2791.txt, hdfs-2791.txt, hdfs-2791.txt,
hdfs-2791.txt
>
>
> The following sequence of events results in a replica mistakenly marked corrupt:
> 1. Pipeline is open with 2 replicas
> 2. DN1 generates a block report but is slow in sending to the NN (eg some flaky network).
It gets "stuck" right before the block report RPC.
> 3. Client closes the file.
> 4. DN2 is fast and sends blockReceived to the NN. NN marks the block as COMPLETE
> 5. DN1's block report proceeds, and includes the block in an RBW state.
> 6. (x) NN incorrectly marks the replica as corrupt, since it is an RBW replica on a COMPLETE
block.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message