Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9F02810764 for ; Sat, 11 Jan 2014 13:58:24 +0000 (UTC) Received: (qmail 25342 invoked by uid 500); 11 Jan 2014 13:55:07 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 25298 invoked by uid 500); 11 Jan 2014 13:55:01 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 24722 invoked by uid 99); 11 Jan 2014 13:53:02 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Jan 2014 13:53:02 +0000 Date: Sat, 11 Jan 2014 13:53:01 +0000 (UTC) From: "Feng Honghua (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-10296?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D138= 68787#comment-13868787 ]=20 Feng Honghua commented on HBASE-10296: -------------------------------------- bq.but that ZK path is used to find the hbase master even if it moves round= a cluster -what would happen there? Typically we adopt master-based paxos in practice, so naturally the master = process hosting the master paxos replica is the active master. the active m= aster is elected by paxos protocal, not by zk. and each standby master know= s who is the current active master. when the active master moves around(for= instance when active master dies or its lease timeout), the client or app = who attempts to talk with the old active master will fail in two ways: fail= to connect if active master dies, or fail by knowing it's now not the acti= ve master and the current new active master info. for the former the client= /app will try randomly other alive master instance and that master will acc= ept its request if it's the new active master, or tell it the current activ= e master info if it's not the current active master. for the latter it can = now talk to the active master...and like how to access a zk, client/app sho= uld know the master assemble addresses to access a HBase cluster. (assumin= g you're saying finding the active master, correct me if I'm wrong) > Replace ZK with a paxos running within master processes to provide better= master failover performance and state consistency > -------------------------------------------------------------------------= -------------------------------------------------- > > Key: HBASE-10296 > URL: https://issues.apache.org/jira/browse/HBASE-10296 > Project: HBase > Issue Type: Brainstorming > Components: master, Region Assignment, regionserver > Reporter: Feng Honghua > > Currently master relies on ZK to elect active master, monitor liveness an= d store almost all of its states, such as region states, table info, replic= ation info and so on. And zk also plays as a channel for master-regionserve= r communication(such as in region assigning) and client-regionserver commun= ication(such as replication state/behavior change).=20 > But zk as a communication channel is fragile due to its one-time watch an= d asynchronous notification mechanism which together can leads to missed ev= ents(hence missed messages), for example the master must rely on the state = transition logic's idempotence to maintain the region assigning state machi= ne's correctness, actually almost all of the most tricky inconsistency issu= es can trace back their root cause to the fragility of zk as a communicatio= n channel. > Replace zk with paxos running within master processes have following bene= fits: > 1. better master failover performance: all master, either the active or t= he standby ones, have the same latest states in memory(except lag ones but = which can eventually catch up later on). whenever the active master dies, t= he newly elected active master can immediately play its role without such f= ailover work as building its in-memory states by consulting meta-table and = zk. > 2. better state consistency: master's in-memory states are the only truth= about the system,which can eliminate inconsistency from the very beginning= . and though the states are contained by all masters, paxos guarantees they= are identical at any time. > 3. more direct and simple communication pattern: client changes state by = sending requests to master, master and regionserver talk directly to each o= ther by sending request and response...all don't bother to using a third-pa= rty storage like zk which can introduce more uncertainty, worse latency and= more complexity. > 4. zk can only be used as liveness monitoring for determining if a region= server is dead, and later on we can eliminate zk totally when we build hear= tbeat between master and regionserver. > I know this might looks like a very crazy re-architect, but it deserves d= eep thinking and serious discussion for it, right? -- This message was sent by Atlassian JIRA (v6.1.5#6160)