hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
Date Wed, 14 Mar 2012 19:24:42 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229527#comment-13229527
] 

Todd Lipcon commented on HDFS-3077:
-----------------------------------

bq. These arguments seem very much to be a case of NIH.
No, they're an argument for uniformity of code base. Hadoop's already a large project. Briefly
skimming the BK code, I see:
- A new NIO server which we'll have to understand and probably bugfix (we've spent literally
years working on our own NIO server for IPC)
- A bunch of ad-hoc serialization code (eg in BookieServer.java). We just spent a long time
making Hadoop wire-compatible using protobufs. We don't want to inherit more code which uses
ad-hoc serialization.
- No metrics subsystem at all - we want to continue to make use of the existing metrics implementation
in Hadoop
- No SASL or SSL implementation. On-the-wire encryption is a requirement we're hearing more
and more in Hadoop. Hadoop IPC already gives us SASL-based encryption
- Password-based authentication instead of kerberos-based. One more password to configure
- Its own on-disk format for logs. So if you take a backup from a bookie, you can't use tools
like the OEV to view them
- A different file format, etc

Certainly, it's a "small matter of code" to add all of these things to BookKeeper. But given
that BK is primarily a project maintained by a research organization, and none of the above
are at all interesting from a research perspective, I don't think it's likely to happen any
time soon.

Then, there is a valid NIH concern -- or really not-maintained-here. As I said above, if we
have a bug in BK, we need to (a) convince someone on the BK team to fix it, (b) get it into
ZK trunk, (c) get the ZK team to make a new release, (d) check Hadoop against any _other_
new changes in that release, (e) convince an operations team which may be distinct from the
Hadoop ops team to update the ZooKeeper installation. That's really painful. If BK were a
mature project with tons of production users, I'd agree we should just depend on it, given
the number of bugs we'd likely find would be very low.

Anyway, this JIRA isn't to argue against BookKeeper. If you want to keep exploring it, please
go ahead - the advantage of a pluggable interface here is that different implementations may
coexist.

bq. Also, I don't think ZAB is the right tool for this in any case. You have a single writer,
which can therefore act as a sequencer on the entries. You just need to broadcast to an ensemble,
and wait for quorum responses, as I outlined above for BookKeeper.

We have a single writer, except for when we don't. During a failover, without a STONITH capability,
we may have overlapping writers. Please see the examples above for why we need sequencing
of multiple writers.

                
> Quorum-based protocol for reading and writing edit logs
> -------------------------------------------------------
>
>                 Key: HDFS-3077
>                 URL: https://issues.apache.org/jira/browse/HDFS-3077
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: ha, name-node
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> Currently, one of the weak points of the HA design is that it relies on shared storage
such as an NFS filer for the shared edit log. One alternative that has been proposed is to
depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated
edit log on commodity hardware. This JIRA is to implement another alternative, based on a
quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only
by HDFS's needs rather than more generic use cases. More details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message