hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
Date Fri, 05 Oct 2012 19:14:07 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470572#comment-13470572
] 

Sanjay Radia commented on HDFS-3077:
------------------------------------

Suresh and I have been looking at the design and compared it to Paxos and Zab in detail and
have concluded that the design is closer to ZAB than Paxos.
* In both cases the recovery establishes a leader and syncs missing transactions across a
number of journal-participants. At the end the leader writes future transactions to the journal-participants.
* The txid is used in both cases (called zxid in ZAB) in similar ways except in ZAB the epoch
is part of the transaction id.
* The recovery process discovers the highest txid, and then arranges to sync the missing transactions
across the participant journals.
* the steps are very similar  - except the HDFS-3077 design has an extra initial step. If
newEpoch and prepareRecovery are merged then the HDFS-3077 will become the same as ZAB.

The proposal is to merge the first 2 steps and just model this after ZAB and use the ZAB terminology.
We have discussed some of the implementation details with Mahadev of the ZK team and can benefit
from insights in some of ZK's lower level details and the corner cases they deal with.  There
are some details on what is persisted and when it is persisted that we would like to discuss
further. 
                
> Quorum-based protocol for reading and writing edit logs
> -------------------------------------------------------
>
>                 Key: HDFS-3077
>                 URL: https://issues.apache.org/jira/browse/HDFS-3077
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: ha, name-node
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: QuorumJournalManager (HDFS-3077)
>
>         Attachments: hdfs-3077-partial.txt, hdfs-3077-test-merge.txt, hdfs-3077.txt,
hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt,
qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf,
qjournal-design.pdf, qjournal-design.tex, qjournal-design.tex
>
>
> Currently, one of the weak points of the HA design is that it relies on shared storage
such as an NFS filer for the shared edit log. One alternative that has been proposed is to
depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated
edit log on commodity hardware. This JIRA is to implement another alternative, based on a
quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only
by HDFS's needs rather than more generic use cases. More details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message