hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10070) HBase read high-availability using eventually consistent region replicas
Date Wed, 15 Jan 2014 22:16:31 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872683#comment-13872683

Enis Soztutar commented on HBASE-10070:

bq. Should this be an architectural objective for HBase? Just asking. Our inspiration addressed
the 99th percentile in a layer above.
I think we should still focus on individual read latencies and try ti minimize the jitter.
Obviously, things like hdfs quorum reads, etc are helpful in this respect, and we also plan
to incorporate that kind of work together with this. 
bq. We should work on this for sure. Native zk client immune to JVM pause has come up in the
past. Would help all around (as per the Vladimir comment above)
Agreed. But MTTR is orthogonal I think. In a region being single-homed world, there is no
way you can get away without some timeout. We can try to reduce it in cases, but a network
partition can always happen. 

bq. Radical! Our DNA up to this has been all about giving the application a consistent view.
Yep, we are not proposing to change the default semantics, just giving the flexibility if
the tradeoffs are justifiable on the user side. 

bq. Could this be build as a layer on top of HBase rather than alter HBase core with shims
on clients and CPs?
I think the most clean way is to bake this into HBase proper. These are some of the reasons
we went with this instead of proposing a layer above: 
 - Regardless of eventual consistency for writes, Replicated read only tables or bulk-load
only tables are one of the major design goals for this work as well. This can and should be
addressed natively by HBase I would argue. The eventual consistency work just extends this
further on a use case basis. 
 - RPC failover + RPC cancellation is not possible to do from outside (or at least easily)

 - A higher level API cannot easily tap into LB to ensure that region replicas are not co-hosted.

bq. Do you envision this feature being always on? Or can it be disabled? If the former (or
latter actually), what implications for current read/write paths do you see?
The branch adds REGION_REPLICATION which is a per-table conf, and get/scan.setConsistency()
API which is per request. The write path is not affected at all. On the read path, we do a
failover (backup) RPC similar to http://static.googleusercontent.com/media/research.google.com/en/us/people/jeff/Berkeley-Latency-Mar2012.pdf.

> HBase read high-availability using eventually consistent region replicas
> ------------------------------------------------------------------------
>                 Key: HBASE-10070
>                 URL: https://issues.apache.org/jira/browse/HBASE-10070
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: HighAvailabilityDesignforreadsApachedoc.pdf
> In the present HBase architecture, it is hard, probably impossible, to satisfy constraints
like 99th percentile of the reads will be served under 10 ms. One of the major factors that
affects this is the MTTR for regions. There are three phases in the MTTR process - detection,
assignment, and recovery. Of these, the detection is usually the longest and is presently
in the order of 20-30 seconds. During this time, the clients would not be able to read the
region data.
> However, some clients will be better served if regions will be available for reads during
recovery for doing eventually consistent reads. This will help with satisfying low latency
guarantees for some class of applications which can work with stale reads.
> For improving read availability, we propose a replicated read-only region serving design,
also referred as secondary regions, or region shadows. Extending current model of a region
being opened for reads and writes in a single region server, the region will be also opened
for reading in region servers. The region server which hosts the region for reads and writes
(as in current case) will be declared as PRIMARY, while 0 or more region servers might be
hosting the region as SECONDARY. There may be more than one secondary (replica count >
> Will attach a design doc shortly which contains most of the details and some thoughts
about development approaches. Reviews are more than welcome. 
> We also have a proof of concept patch, which includes the master and regions server side
of changes. Client side changes will be coming soon as well. 

This message was sent by Atlassian JIRA

View raw message