hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
Date Tue, 09 Oct 2012 03:52:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472112#comment-13472112

Sanjay Radia commented on HDFS-3077:

The updated journal file isn't sufficient because it doesn't record information about whether
it was an accepted recovery proposal or whether it was just left over at the last write. You
need to ensure the property that, if the recovery coordinator thinks a value is accepted,
then no different recovery will be accepted in the future (otherwise you risk having two different
finalized lengths for the same log segment). In order to do so, you need to wait until a quorum
of nodes are Finalized before you know that any future recovery will be able to rely only
on the finalization state.

I don't know enough about the details of the ZAB implementation to understand why they can
get away without this, if in fact they can. My guess is that it's because the transaction
IDs themselves have the epoch number as their high order bits, and hence you can't ever confuse
the first txn of epoch N+1 with the last transaction of epoch N.
Yes, ZAB avoids this because epoch and txid are combined.
Lets please add the counter example that you describe above in the doc (if it is already there
just add a comment that the example 
explains why the extra persistent info is needed.)
> Quorum-based protocol for reading and writing edit logs
> -------------------------------------------------------
>                 Key: HDFS-3077
>                 URL: https://issues.apache.org/jira/browse/HDFS-3077
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: ha, name-node
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: QuorumJournalManager (HDFS-3077)
>         Attachments: hdfs-3077-partial.txt, hdfs-3077-test-merge.txt, hdfs-3077.txt,
hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt,
qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf,
qjournal-design.pdf, qjournal-design.tex, qjournal-design.tex
> Currently, one of the weak points of the HA design is that it relies on shared storage
such as an NFS filer for the shared edit log. One alternative that has been proposed is to
depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated
edit log on commodity hardware. This JIRA is to implement another alternative, based on a
quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only
by HDFS's needs rather than more generic use cases. More details to follow.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message