hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10070) HBase read high-availability using eventually consistent region replicas
Date Tue, 17 Dec 2013 01:33:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849978#comment-13849978

Enis Soztutar commented on HBASE-10070:

Update: as discussed, we have put a proof-of-concept implementation for a working end-to-end
scenario, and would like to share that to get some early reviews and feedback. If you are
interested on the technical side of the changes, please check the patch/branch out. Please
note that the patches and the branch is far from being clean and complete, but otherwise clean
enough to understand the scope of changes and areas that are touched. This also contains the
end-to-end API's at the client side (except for execution policies). We will continue to work
on the patches to get them in a more mature state, and recreate and clean up the patches for
reviews, but at any stage, reviews / comments are welcome. We will keep pushing the changes
to this repo / branch until the patches are in a more stable state, at which point, we will
work on cleaning up and shuffling the patches to be more consumable by reviews. 

The code is at github repo: https://github.com/enis/hbase.git, and the branch is hbase-10070-demo.
This repository is based on 0.96.0 for now. I'll also attach a patch which contains all the
changes if you want to take a closer look.  
This can be build with:
git clone git@github.com:enis/hbase.git
cd hbase 
git checkout hbase-10070-demo 
MAVEN_OPTS="-Xmx2g" mvn clean install package assembly:single -Dhadoop.profile=2.0  -DskipTests
 -Dmaven.javadoc.skip=true -Dhadoop-two.version=2.2.0
The tar ball generated would be hbase-assembly/target/hbase-0.96.0-bin.tar.gz
The hadoop version that should be used for real cluster testing is 2.2.0. 

What's there in the repository:
1. Client (Shell) changes
The shell has been modified so that tables with more than one replica per region can be created:
  create 't1', 'f1', {REGION_REPLICATION => 2}
One can also 'describe' a table and that will have the replica configuration in the response
  describe 't1'
One can do a 'get' with the eventual consistency flag set (we haven't implemented the consistency
semantics for the 'scan' family in this drop):
  get 't2','r6',{"EVENTUAL_CONSISTENCY" => true}
[NOTE THE quotes around the EVENTUAL_CONSISTENCY string. Will fix this soon to work without
the quotes.]

Outside the shell, the API to do with setting the willingness to tolerate eventual consistency
is Get.setConsistency and the returned Result can be queried if it is stale or not via Result.isStale

2. Master changes
The one main change here is about creation and management of replica HRegionInfo objects.
The other change is to make the StochasticLoadBalancer aware of the replicas. During the assignment
process, the Assignment Manager consults the balancer to give it a plan for the assignment
- here the StochasticLoadBalancer ensures that the plan takes into account the constraint
- primary/secondary not assigned to the same server, same rack (if more than one rack configured).

3. RegionServer changes
The one main change here is to be able to open regions in readonly mode. The other change
here is to do with the periodic refresh of store files. The configuration that sets this up
is (this is disabled by default):
4. UI changes
The UIs corresponding to the tables' status and the regions' status have been modified to
say whether they have replicas.

There are unit tests - TestMasterReplicaRegions,TestRegionReplicas,TestReplicasClient,TestBaseLoadBalancer
and some others. 

There is also a manual test scenario to test out reads coming from the secondary replica:
1. create a (at least) two node cluster.  
2. create a table with replica 2. From HBase shell:
  create 't1', 'f1', {REGION_REPLICATION => 2}
3. arrange the regions so that the the primary region is not co-located with meta or namespace
 You can use move commands in HBase shell for that: 
 move '4392870ae8ef482406c272eec0312a02', ',60020,1387069812919'

4. from the shell do a couple of puts and then 'flush' the table from the shell
hbase(main):005:0> for i in 1..100
hbase(main):006:1> put 't1', "r#{i}", 'f1:c1', i
hbase(main):007:1> end
hbase(main):009:0> flush 't1'
5. suspend the region server which is hosting the primary region replica by sending kill -STOP
signal from bash:
kill -STOP <pid_of_region_server> 
6. get a row from the table with eventual-consistency flag set to true.
 get 't2','r6',{"EVENTUAL_CONSISTENCY" => true}
7. put should fail
The (6) and (7) steps should be done quickly enough otherwise the master would recover the
region!! (Default ZK session timeout is 90 seconds)

> HBase read high-availability using eventually consistent region replicas
> ------------------------------------------------------------------------
>                 Key: HBASE-10070
>                 URL: https://issues.apache.org/jira/browse/HBASE-10070
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>         Attachments: HighAvailabilityDesignforreadsApachedoc.pdf
> In the present HBase architecture, it is hard, probably impossible, to satisfy constraints
like 99th percentile of the reads will be served under 10 ms. One of the major factors that
affects this is the MTTR for regions. There are three phases in the MTTR process - detection,
assignment, and recovery. Of these, the detection is usually the longest and is presently
in the order of 20-30 seconds. During this time, the clients would not be able to read the
region data.
> However, some clients will be better served if regions will be available for reads during
recovery for doing eventually consistent reads. This will help with satisfying low latency
guarantees for some class of applications which can work with stale reads.
> For improving read availability, we propose a replicated read-only region serving design,
also referred as secondary regions, or region shadows. Extending current model of a region
being opened for reads and writes in a single region server, the region will be also opened
for reading in region servers. The region server which hosts the region for reads and writes
(as in current case) will be declared as PRIMARY, while 0 or more region servers might be
hosting the region as SECONDARY. There may be more than one secondary (replica count >
> Will attach a design doc shortly which contains most of the details and some thoughts
about development approaches. Reviews are more than welcome. 
> We also have a proof of concept patch, which includes the master and regions server side
of changes. Client side changes will be coming soon as well. 

This message was sent by Atlassian JIRA

View raw message