hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10070) HBase read high-availability using eventually consistent region replicas
Date Sat, 18 Jan 2014 00:55:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875452#comment-13875452

Devaraj Das commented on HBASE-10070:

bq. Ok on the timing. You know how I feel about 1.0 – sooner rather than later – but hopefully
this feature gets done in time.

Yeah.. couple of us are on it.

bq. After thinking more on this, I 'get' why you have the replicas listed inside in the row
rather than as rows themselves [in hbase:meta]. The row in hbase:meta becomes a proxy or facade
for the little cluster of regions one of which is the primary with the others read replicas.

That's great. A copy-paste of what I said in the RB on HBASE-10347 for others' reference.
"I and Enis had debated this as well. The consensus between us was that we don't need to add
new META rows for the replicas. After all, the HRI information is exactly the same for all
the replicas except for the replicaID. In the current meta, we already have a column for the
location of a region. It seemed logical to just extend that model - add newer columns for
the replica locations (and similarly for the other columns like seqnum). That way everything
for a particular user-visible region stays in one row (and makes it easier for readers to
know about all replica locations from that one row). Regarding special casing, yes there is
some special casing in the way the regions are added to the meta - create table will create
all regions (if the table was created with replica > 1), but only the primary regions will
be added to the meta. The regionserver - when it updates the meta with the location after
it opens a region invokes the API passing the replicaID as an argument - the column names
are different based on whether the replicaID is primary or not. These are pretty much the
special cases for the meta updates."

bq. HRegionInfo now is overloaded. Before it was the info on a specific region. Now it is
trying to serve two purposes; its original intent and now too as a descriptor on the region-serving
'cluster' made of a primary and replicas. Lets avoid overloading what up to this has had a
clear role in the hbase model.

By doing it the way we have in the patch on HBASE-10347, it seems to reflect what's going
on - "HRI is a logical descriptor and a facade for a bunch of primary & replicas". That's
how we store things in the meta and how we reconstruct HRIs from the meta when needed.
There are possibly other approaches of doing this. E.g. Extend HRegionInfo as, say, HRegionInfoReplica
and maintain the information about replicaID there, and/or change all the relevant methods
to accept HRegionInfoReplica and potentially return this as well in relevant situations. The
issue there is those approaches would be very intrusive and we would still need special cases
for replicaID == 0 or not. Not confident how much we would gain there. Is it too much to ask
to change the view of what a HRI means (to what you say above). Anyway, let me ponder a bit
on this...

bq. The primary holds the 'pole position' being the name of the region in meta. The read replicas
are differently named with the 00001 and 00002, etc., interpolated into the middle of the
region name. I suppose doing it this way 'minimizes' the disturbance in the code base but
I'm worried this naming exception will only confuse though it minimizes change. Why would
the primary not be named like the replica regions?

I don't mind naming the primary regions similar to the replicas. This might mean tools that
currently depend on the name format would break even if the cluster is not deploying tables
with replicas (you guessed that response :-)) But yeah, if you go the full Paxos route, the
'primary' could be anyone in the replica-set and there it makes sense to have all members
in the set to have an index.

> HBase read high-availability using eventually consistent region replicas
> ------------------------------------------------------------------------
>                 Key: HBASE-10070
>                 URL: https://issues.apache.org/jira/browse/HBASE-10070
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: HighAvailabilityDesignforreadsApachedoc.pdf
> In the present HBase architecture, it is hard, probably impossible, to satisfy constraints
like 99th percentile of the reads will be served under 10 ms. One of the major factors that
affects this is the MTTR for regions. There are three phases in the MTTR process - detection,
assignment, and recovery. Of these, the detection is usually the longest and is presently
in the order of 20-30 seconds. During this time, the clients would not be able to read the
region data.
> However, some clients will be better served if regions will be available for reads during
recovery for doing eventually consistent reads. This will help with satisfying low latency
guarantees for some class of applications which can work with stale reads.
> For improving read availability, we propose a replicated read-only region serving design,
also referred as secondary regions, or region shadows. Extending current model of a region
being opened for reads and writes in a single region server, the region will be also opened
for reading in region servers. The region server which hosts the region for reads and writes
(as in current case) will be declared as PRIMARY, while 0 or more region servers might be
hosting the region as SECONDARY. There may be more than one secondary (replica count >
> Will attach a design doc shortly which contains most of the details and some thoughts
about development approaches. Reviews are more than welcome. 
> We also have a proof of concept patch, which includes the master and regions server side
of changes. Client side changes will be coming soon as well. 

This message was sent by Atlassian JIRA

View raw message