Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Sat, 11 Jan 2014 13:53:01 +0000 (UTC)
From: "Feng Honghua (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12687801.1389156015871.82272.1389448381990@arcas>
In-Reply-To: <JIRA.12687801.1389156015871@arcas>
References: <JIRA.12687801.1389156015871@arcas>
Subject: [jira] [Commented] (HBASE-10296) Replace ZK with a paxos running
 within master processes to provide better master failover performance and
 state consistency
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HBASE-10296?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D138=
68787#comment-13868787 ]=20

Feng Honghua commented on HBASE-10296:
--------------------------------------

bq.but that ZK path is used to find the hbase master even if it moves round=
 a cluster -what would happen there?
Typically we adopt master-based paxos in practice, so naturally the master =
process hosting the master paxos replica is the active master. the active m=
aster is elected by paxos protocal, not by zk. and each standby master know=
s who is the current active master. when the active master moves around(for=
 instance when active master dies or its lease timeout), the client or app =
who attempts to talk with the old active master will fail in two ways: fail=
 to connect if active master dies, or fail by knowing it's now not the acti=
ve master and the current new active master info. for the former the client=
/app will try randomly other alive master instance and that master will acc=
ept its request if it's the new active master, or tell it the current activ=
e master info if it's not the current active master. for the latter it can =
now talk to the active master...and like how to access a zk, client/app sho=
uld know the master assemble addresses to access a  HBase cluster. (assumin=
g you're saying finding the active master, correct me if I'm wrong)

> Replace ZK with a paxos running within master processes to provide better=
 master failover performance and state consistency
> -------------------------------------------------------------------------=
--------------------------------------------------
>
>                 Key: HBASE-10296
>                 URL: https://issues.apache.org/jira/browse/HBASE-10296
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: master, Region Assignment, regionserver
>            Reporter: Feng Honghua
>
> Currently master relies on ZK to elect active master, monitor liveness an=
d store almost all of its states, such as region states, table info, replic=
ation info and so on. And zk also plays as a channel for master-regionserve=
r communication(such as in region assigning) and client-regionserver commun=
ication(such as replication state/behavior change).=20
> But zk as a communication channel is fragile due to its one-time watch an=
d asynchronous notification mechanism which together can leads to missed ev=
ents(hence missed messages), for example the master must rely on the state =
transition logic's idempotence to maintain the region assigning state machi=
ne's correctness, actually almost all of the most tricky inconsistency issu=
es can trace back their root cause to the fragility of zk as a communicatio=
n channel.
> Replace zk with paxos running within master processes have following bene=
fits:
> 1. better master failover performance: all master, either the active or t=
he standby ones, have the same latest states in memory(except lag ones but =
which can eventually catch up later on). whenever the active master dies, t=
he newly elected active master can immediately play its role without such f=
ailover work as building its in-memory states by consulting meta-table and =
zk.
> 2. better state consistency: master's in-memory states are the only truth=
 about the system,which can eliminate inconsistency from the very beginning=
. and though the states are contained by all masters, paxos guarantees they=
 are identical at any time.
> 3. more direct and simple communication pattern: client changes state by =
sending requests to master, master and regionserver talk directly to each o=
ther by sending request and response...all don't bother to using a third-pa=
rty storage like zk which can introduce more uncertainty, worse latency and=
 more complexity.
> 4. zk can only be used as liveness monitoring for determining if a region=
server is dead, and later on we can eliminate zk totally when we build hear=
tbeat between master and regionserver.
> I know this might looks like a very crazy re-architect, but it deserves d=
eep thinking and serious discussion for it, right?


--
This message was sent by Atlassian JIRA
(v6.1.5#6160)