hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1972) HA: Datanode fencing mechanism
Date Wed, 14 Dec 2011 07:16:31 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169132#comment-13169132

Todd Lipcon commented on HDFS-1972:

Yes, like Dhruba said, that's what the patch does. The slight added complexities are:

(a) track only the postponed over-replicated blocks to prevent having to take a lock and rescan
all the blocks once the last DN checks in.
(b) we need to actually have a heartbeat and *then* a BR from each DN. If after the NN becomes
active we get a BR immediately, there's a short window where it might receive a deletion request
prior to the next heartbeat.

@Dhruba: I considered your trick of reprocessing all the replicated blocks while holding only
the readlock. But, it seems this is still high-impact -- holding the readlock for potentially
10-20 seconds will block many operations including getBlockLocations (which updates access
time) as well as any namespace writes.
> HA: Datanode fencing mechanism
> ------------------------------
>                 Key: HDFS-1972
>                 URL: https://issues.apache.org/jira/browse/HDFS-1972
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node, ha, name-node
>            Reporter: Suresh Srinivas
>            Assignee: Todd Lipcon
>         Attachments: hdfs-1972-v1.txt, hdfs-1972.txt
> In high availability setup, with an active and standby namenode, there is a possibility
of two namenodes sending commands to the datanode. The datanode must honor commands from only
the active namenode and reject the commands from standby, to prevent corruption. This invariant
must be complied with during fail over and other states such as split brain. This jira addresses
issues related to this, design of the solution and implementation.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message