hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Lam <chiling...@gmail.com>
Subject Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write
Date Mon, 23 Apr 2012 12:37:42 GMT
Hi Lars:

I will try your suggestion today with a master-slave replication enabled from Cluster A ->
Cluster B. 
Last Friday, I tried to limit the variability/the moving part of the replication components.
I reduced the size of Cluster B to have only 1 regionserver and having Cluster A to replicate
data from one region only without region splitting (therefore I have 1-to-1 region replication
setup). During the benchmark, I moved the region between different regionservers in Cluster
A (note there are still 3 regionservers in Cluster A). I ran this test for 5 times and no
data were lost. Does it mean something? My feeling is there are some glitches/corner cases
that have not been covered in the cyclic replication (or hbase replication in general). Note
that, this happens only when the load is high. 

By the way, why do we need to have a zookeeper not handled by hbase for the replication to
work (it is described in the hbase documentation)?

Best Regards,


On 2012-04-20, at 7:08 PM, lars hofhansl wrote:

> I see.
> Does this only happen when cyclic replication is enabled in this way (i.e. master <->
master replication).
> The replication back does take some overhead as the replicator needs to filter edits
from being replication back to the originator, but I would not have thought that would cause
any issues.
> Could you run the same test once with replication only enabled from ClusterA -> ClusterB?
> Thanks.
> -- Lars
> ----- Original Message -----
> From: Jerry Lam <chilinglam@gmail.com>
> To: "user@hbase.apache.org" <user@hbase.apache.org>
> Cc: 
> Sent: Friday, April 20, 2012 3:43 PM
> Subject: Re: HBase Cyclic Replication Issue: some data are missing in the replication
for intensive write
> Hi Himanshu:
> I'm using hbase 0.92.1 and hadoop 1.0.1 migrating from hbase 0.90.4 and Hadoop 0.20 with
append feature.
> It is a one side replication (cluster A to cluster B) with cyclic replication enabled
(i.e. add_peer of the other cluster configured). 
> Best Regards,
> Jerry
> Sent from my iPad
> On 2012-04-20, at 10:23, Himanshu Vashishtha <hvashish@cs.ualberta.ca> wrote:
>> Hello Jerry,
>> Which HBase version?
>> You are not "using" cyclic replication? Its simple one side replication, right?
>> Thanks,
>> Himanshu
>> On Fri, Apr 20, 2012 at 5:38 AM, Jerry Lam <chilinglam@gmail.com> wrote:
>>> Hi HBase community:
>>> We have been testing cyclic replication for 1 week. The basic functionality seems
to work as described in the document however when we started to increase the write workload,
the replication starts to miss data (i.e. some data are not replicated to the other cluster).
We have narrowed down to a scenario that we can reproduce the problem quite consistently and
here it is:
>>> -----------------------------
>>> Setup:
>>> - We have setup 2 clusters (cluster A and cluster B)with identical size in terms
of number of nodes and configuration, 3 regionservers sit on top of 3 datanodes.
>>> - Cyclic replication is enabled.
>>> - We use YCSB to generate load to hbase the workload is very similar to workloada:
>>> recordcount=200000
>>> operationcount=200000
>>> workload=com.yahoo.ycsb.workloads.CoreWorkload
>>> fieldcount=1
>>> fieldlength=25000
>>> readallfields=true
>>> writeallfields=true
>>> readproportion=0
>>> updateproportion=1
>>> scanproportion=0
>>> insertproportion=0
>>> requestdistribution=uniform
>>> - Records are inserted into Cluster A. After the benchmark is done and wait until
all data are replicated to Cluster B, we used verifyrep mapreduce job for validation.
>>> - Data are deleted from both table (truncate 'tablename') before a new experiment
is started.
>>> Scenario:
>>> when we increase the number of threads until it max out the throughput of the
cluster, we saw some data are missing in Cluster B (total count != 200000) although cluster
A clearly has them all. This happens even though we disabled region splitting in both clusters
(it happens more often when region splits occur). To further having more control of what is
happening, we then decided to disable the load balancer so the region (which is responsible
for the replicating data) will not relocate to other regionserver during the benchmark. The
situation improves a lot. We don't see any missing data in 5 continuous runs. Finally, we
decided to move the region around from a regionserver to another regionserver during the benchmark
to see if the problem will reappear and it did.
>>> We believe that the issue could be related to region splitting and load balancing
during intensive write, the hbase replication strategy hasn't yet cover those corner cases.
>>> Can someone take a look of it and suggest some ways to workaround this?
>>> Thanks~
>>> Jerry

View raw message