hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
Date Tue, 03 Apr 2012 22:08:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245806#comment-13245806

Todd Lipcon commented on HDFS-3077:

Hi Bikas. Thanks for bringing up this scenario. I do need to add a section to the doc about
failure handling and re-adding failed journals.

My thinking is that the granularity of "membership" is the log segment. This is similar to
what we do on local disks today - when we roll the edit log, we attempt to re-add any disks
that previously failed. Similarly, when we start a new log segment, we give all of the JNs
a chance to pick back up following along with the quorum.

To try to map to your example, we'd have the following:
JN1: writing edits_inprogress_1 (@txn 100)
JN2: writing edits_inprogress_1 (@txn 100)
JN3: has been reformatted, comes back online

At this point, the QJM can try to write txns to all three, but JN3 won't accept transactions
because it doesn't have a currently open log segment. Currently it will just reject them.
I can imagine a future optimization in which it would return a special exception, and the
QJM could notify the NN that it would like to roll ASAP if possible.

Let's say we write another 20 txns, and then roll logs. On the next startLogSegment call,
we'd end up with the following:

JN1: edits_1-120, edits_inprogress_121
JN2: edits_1-120, edits_inprogress_121
JN3: edits_inprogress_121

so all nodes are now taking part in the quorum. We could optionally at this point have JN3
copy over the edits_1-120 segment from one of the other nodes, but that copy can be asynchronous.
It's a repair operation, but given we already have 2 valid replicas, we aren't in any imminent
danger of data loss.
> Quorum-based protocol for reading and writing edit logs
> -------------------------------------------------------
>                 Key: HDFS-3077
>                 URL: https://issues.apache.org/jira/browse/HDFS-3077
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: ha, name-node
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-3077-partial.txt, qjournal-design.pdf
> Currently, one of the weak points of the HA design is that it relies on shared storage
such as an NFS filer for the shared edit log. One alternative that has been proposed is to
depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated
edit log on commodity hardware. This JIRA is to implement another alternative, based on a
quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only
by HDFS's needs rather than more generic use cases. More details to follow.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message