hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
Date Tue, 03 Apr 2012 21:06:28 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245736#comment-13245736

Todd Lipcon commented on HDFS-3077:

bq. I think it will help clarify the doc, if you add the explanation for Hari's example. Even
though epoch 2 is persisted on JN1, its last log segment is still tied to epoch 1 and it needs
to sync its last log segment with JN2/JN3. Are you proposing that JN1 drop its last edits
in progress and pick up the corresponding finalized segment from JN1/JN2. Or is it TBD?

Yes, I think it would see that its copy of the segment is "out of date" epoch-wise, delete
it, and then copy the finalized segments from the other nodes later. I'll try to expand upon
this portion of the doc in the coming days.

I also have another idea which may be slightly simpler -- Suresh got me thinking about it
a bit. Basically the idea is that, instead of deleting empty edit logs, we could "fill them
in" with a single NOOP transaction. Let me think on this for a little while and then update
the design doc if it turns out to work.

bq. Btw, there is some new code here but there seems to be some code in existing NN that changes
the sequential journal sync to parallel (based on reading your doc and not your patch).

Nope, the thinking is that all of the new code will be encapsulated by QuorumJournalManager.
So, from the NN's perspective, there is only a single edit log. It happens that that edit
log is distributed and fault-tolerant underneath, but the NN would see it as a single "required"
journal, and crash if it fails to sync.

bq. Are you planning on committing this to a branch or directly to trunk?

I'm happy to do either. Suresh seemed to think doing it on a branch would be counter-productive
to code sharing. In practice it's almost new code, so as long as we're clear to mark it "in-progress"
or "experimental", I don't think it would be destabilizing to do in trunk. HDFS-3190 is the
one place in which I've modified NN code, but only trivially.
> Quorum-based protocol for reading and writing edit logs
> -------------------------------------------------------
>                 Key: HDFS-3077
>                 URL: https://issues.apache.org/jira/browse/HDFS-3077
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: ha, name-node
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-3077-partial.txt, qjournal-design.pdf
> Currently, one of the weak points of the HA design is that it relies on shared storage
such as an NFS filer for the shared edit log. One alternative that has been proposed is to
depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated
edit log on commodity hardware. This JIRA is to implement another alternative, based on a
quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only
by HDFS's needs rather than more generic use cases. More details to follow.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message