Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 309D49647 for ; Thu, 28 Jun 2012 18:55:47 +0000 (UTC) Received: (qmail 95960 invoked by uid 500); 28 Jun 2012 18:55:47 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 95918 invoked by uid 500); 28 Jun 2012 18:55:46 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 95910 invoked by uid 99); 28 Jun 2012 18:55:46 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jun 2012 18:55:46 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id BF977142840 for ; Thu, 28 Jun 2012 18:55:46 +0000 (UTC) Date: Thu, 28 Jun 2012 18:55:46 +0000 (UTC) From: "Todd Lipcon (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1439262540.68404.1340909746789.JavaMail.jiratomcat@issues-vm> In-Reply-To: <4335590.3899.1331577879305.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403363#comment-13403363 ] Todd Lipcon commented on HDFS-3077: ----------------------------------- bq. Quorum is a semantics on the client/writer side and not the server side policy. Hence the protocol for journaling should be generic enough. So lets not call it QJournalProtocol and make it generic, allowing other types of clients/writers. I disagree with this statement. The commit protocol is strongly intertwined with the way in which the server has to behave. For example, the "new epoch" command needs to provide back certain information about the current state of the journals and previous paxos-style 'accepted' decisions. Trying to shoehorn it into a generic protocol doesn't make much sense to me. bq. If you see some functionality missing in 3092, lets discuss and add it there, instead of copying code and changing it separately 3092's "log syncing" stuff doesn't fit with the recovery protocol needed for correct operation in a quorum commit setting. 3092's method of the JNs "registering" with the NN doesn't make sense either in this system, since group membership changes are not yet designed for and are quite complex. So it's not a matter of adding functionality to 3092, it's a matter of removing a lot of the functionality which just doesn't fit with this commit protocol. bq. Also 3092 has been in development in open, in incremental fashion. I think we should follow this, instead of attaching a big patch from github. I made a best effort to do it in the open and incrementally, but didn't get any responses from the community. See HDFS-3188 and HDFS-3189 for example, both of which I posted back in April. I remember in the same discussions you referenced above that you said you'd take a look at these in the spirit of incremental progress. I understand you got busy with other things, but I wasn't going to stop working on the project in the meantime. So, work progressed and now there's a more fully baked implementation here. Don't be fooled by the big size of the patch - the majority of the lines of code are essentially boiler-plate -- protobuf translators, simple code to start/stop RPC and HTTP servers, etc. I don't think this is unreasonably large to review. > Quorum-based protocol for reading and writing edit logs > ------------------------------------------------------- > > Key: HDFS-3077 > URL: https://issues.apache.org/jira/browse/HDFS-3077 > Project: Hadoop HDFS > Issue Type: New Feature > Components: ha, name-node > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: hdfs-3077-partial.txt, hdfs-3077.txt, qjournal-design.pdf, qjournal-design.pdf > > > Currently, one of the weak points of the HA design is that it relies on shared storage such as an NFS filer for the shared edit log. One alternative that has been proposed is to depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated edit log on commodity hardware. This JIRA is to implement another alternative, based on a quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only by HDFS's needs rather than more generic use cases. More details to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira