hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
Date Mon, 12 Mar 2012 18:58:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227780#comment-13227780
] 

Todd Lipcon commented on HDFS-3077:
-----------------------------------

I plan to post a more thorough design doc in the next week or two, but my current rough thinking
on the design is:

- Implement a standalone daemon which hosts the JournalProtocol interface, or something similar
to it. This daemon is responsible for accepting edits from the active master and writing them
locally. We will need to extend the interface to also allow reading from the logs, and a recovery
operation (see below)
- The user would configure an odd number of these logger daemons (could be collocated with
the NNs or even with DNs, given a dedicated disk)
- Create a JournalManager implementation which uses a quorum protocol to commit edits to the
logger daemons.

As for the particular quorum protocol, I am leaning towards implementing ZAB -- its properties
align very closely to the semantics we need.

I see the following advantages over the BookKeeper approach:
- Re-uses existing Hadoop subsystems like IPC, security, and the file-based edit logging code.
This means that it will be easier to maintain for the Hadoop development community, and easier
to deploy for Hadoop operations.
- Doesn't introduce a new dependency on an external project. If there is a bug discovered
in this code, we can fix it with a new Hadoop release without having to wait on a new release
of ZooKeeper. Since ZK and HDFS may be managed by different ops teams, this also simplifies
upgrade.
- BookKeeper is a general system, whereas this is a specific system. Since BK tries to be
quite general, it has extra complexity that we don't need. For example, it handles the interleaving
of up to thousands of distinct edit logs into a single on-disk layout. These complexities
are useful for a general "write-ahead log as a service" project, but not for our use case
where even very large clusters have only a handful of distinct logs.
- BookKeeper's commit protocol waits for all replicas to commit. This means that, should one
of the bookies fail, one must wait for a rather lengthy timeout before continuing. Additionally,
the latency of a commit is the maximum of the latency of the bookies, meaning that it's much
less feasible to collocate bookies with other machines under load like DataNodes. A quorum
commit protocol instead has a latency equal to the median of its replicas' latencies, allowing
it to ride over transient slowness on the part of one of its replicas.

The BK approach is still interesting for some deployments, and I'd advocate that we explore
both options in parallel. If one approach as implemented turns out to have significant advantages,
we can at that time choose to support only one, but in the meantime, I think it's worth developing
both approaches.




                
> Quorum-based protocol for reading and writing edit logs
> -------------------------------------------------------
>
>                 Key: HDFS-3077
>                 URL: https://issues.apache.org/jira/browse/HDFS-3077
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: ha, name-node
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> Currently, one of the weak points of the HA design is that it relies on shared storage
such as an NFS filer for the shared edit log. One alternative that has been proposed is to
depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated
edit log on commodity hardware. This JIRA is to implement another alternative, based on a
quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only
by HDFS's needs rather than more generic use cases. More details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message