Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Sat, 18 Jan 2014 00:55:21 +0000 (UTC)
From: "Devaraj Das (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12682280.1386031479988.27948.1390006521420@arcas>
In-Reply-To: <JIRA.12682280.1386031479988@arcas>
References: <JIRA.12682280.1386031479988@arcas>
Subject: [jira] [Commented] (HBASE-10070) HBase read high-availability using
 eventually consistent region replicas
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HBASE-10070?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D138=
75452#comment-13875452 ]=20

Devaraj Das commented on HBASE-10070:
-------------------------------------

bq. Ok on the timing. You know how I feel about 1.0 =E2=80=93 sooner rather=
 than later =E2=80=93 but hopefully this feature gets done in time.

Yeah.. couple of us are on it.

bq. After thinking more on this, I 'get' why you have the replicas listed i=
nside in the row rather than as rows themselves [in hbase:meta]. The row in=
 hbase:meta becomes a proxy or facade for the little cluster of regions one=
 of which is the primary with the others read replicas.=20

That's great. A copy-paste of what I said in the RB on HBASE-10347 for othe=
rs' reference.
"I and Enis had debated this as well. The consensus between us was that we =
don't need to add new META rows for the replicas. After all, the HRI inform=
ation is exactly the same for all the replicas except for the replicaID. In=
 the current meta, we already have a column for the location of a region. I=
t seemed logical to just extend that model - add newer columns for the repl=
ica locations (and similarly for the other columns like seqnum). That way e=
verything for a particular user-visible region stays in one row (and makes =
it easier for readers to know about all replica locations from that one row=
). Regarding special casing, yes there is some special casing in the way th=
e regions are added to the meta - create table will create all regions (if =
the table was created with replica > 1), but only the primary regions will =
be added to the meta. The regionserver - when it updates the meta with the =
location after it opens a region invokes the API passing the replicaID as a=
n argument - the column names are different based on whether the replicaID =
is primary or not. These are pretty much the special cases for the meta upd=
ates."

bq. HRegionInfo now is overloaded. Before it was the info on a specific reg=
ion. Now it is trying to serve two purposes; its original intent and now to=
o as a descriptor on the region-serving 'cluster' made of a primary and rep=
licas. Lets avoid overloading what up to this has had a clear role in the h=
base model.

By doing it the way we have in the patch on HBASE-10347, it seems to reflec=
t what's going on - "HRI is a logical descriptor and a facade for a bunch o=
f primary & replicas". That's how we store things in the meta and how we re=
construct HRIs from the meta when needed.
There are possibly other approaches of doing this. E.g. Extend HRegionInfo =
as, say, HRegionInfoReplica and maintain the information about replicaID th=
ere, and/or change all the relevant methods to accept HRegionInfoReplica an=
d potentially return this as well in relevant situations. The issue there i=
s those approaches would be very intrusive and we would still need special =
cases for replicaID =3D=3D 0 or not. Not confident how much we would gain t=
here. Is it too much to ask to change the view of what a HRI means (to what=
 you say above). Anyway, let me ponder a bit on this...

bq. The primary holds the 'pole position' being the name of the region in m=
eta. The read replicas are differently named with the 00001 and 00002, etc.=
, interpolated into the middle of the region name. I suppose doing it this =
way 'minimizes' the disturbance in the code base but I'm worried this namin=
g exception will only confuse though it minimizes change. Why would the pri=
mary not be named like the replica regions?

I don't mind naming the primary regions similar to the replicas. This might=
 mean tools that currently depend on the name format would break even if th=
e cluster is not deploying tables with replicas (you guessed that response =
:-)) But yeah, if you go the full Paxos route, the 'primary' could be anyon=
e in the replica-set and there it makes sense to have all members in the se=
t to have an index.

> HBase read high-availability using eventually consistent region replicas
> ------------------------------------------------------------------------
>
>                 Key: HBASE-10070
>                 URL: https://issues.apache.org/jira/browse/HBASE-10070
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: HighAvailabilityDesignforreadsApachedoc.pdf
>
>
> In the present HBase architecture, it is hard, probably impossible, to sa=
tisfy constraints like 99th percentile of the reads will be served under 10=
 ms. One of the major factors that affects this is the MTTR for regions. Th=
ere are three phases in the MTTR process - detection, assignment, and recov=
ery. Of these, the detection is usually the longest and is presently in the=
 order of 20-30 seconds. During this time, the clients would not be able to=
 read the region data.
> However, some clients will be better served if regions will be available =
for reads during recovery for doing eventually consistent reads. This will =
help with satisfying low latency guarantees for some class of applications =
which can work with stale reads.
> For improving read availability, we propose a replicated read-only region=
 serving design, also referred as secondary regions, or region shadows. Ext=
ending current model of a region being opened for reads and writes in a sin=
gle region server, the region will be also opened for reading in region ser=
vers. The region server which hosts the region for reads and writes (as in =
current case) will be declared as PRIMARY, while 0 or more region servers m=
ight be hosting the region as SECONDARY. There may be more than one seconda=
ry (replica count > 2).
> Will attach a design doc shortly which contains most of the details and s=
ome thoughts about development approaches. Reviews are more than welcome.=
=20
> We also have a proof of concept patch, which includes the master and regi=
ons server side of changes. Client side changes will be coming soon as well=
.=20


--
This message was sent by Atlassian JIRA
(v6.1.5#6160)