hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject Re: HBase Replication use cases
Date Sat, 14 Apr 2012 10:38:07 GMT

I was after table consistency. As soon as you break a entity group into more than one row
you might have a problem that they span two regions. Now assume row 1 in region 1 is updated,
but row 2 in region 2 is not because the replication lagged and now the originating cluster
is a goner. 

I was asking how people handle this. Maybe your custom region split rules can help here to
enforce rows of an entity group (I am using the Megastore notation here) are all on one server
and therefore *should* be updated together. This would require also your cross row transactions
using single WAL entries for multiple rows - that is what this does, right?

But on larger scale, I was hoping that we could at recovery points into an entire table, so
that when the replication stops and you are promoting the slave to be master, then you can
delete all partial updates (sure, they are entire row updates, so how do you roll that back?)
to ensure you have consistency. It is less about losing data, but more about keeping the table

Is this not a concern for you guys? Do you have a schema that does not have this issue, for
example, are you forcing all entity groups to be a single row?

Just curious.


On Apr 14, 2012, at 2:31 AM, lars hofhansl wrote:

> Ah yes. Good point. You're absolutely right.
> ----- Original Message -----
> From: Himanshu Vashishtha <hvashish@cs.ualberta.ca>
> To: dev@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com>
> Cc: 
> Sent: Friday, April 13, 2012 5:18 PM
> Subject: Re: HBase Replication use cases
> @Lars H:
> WalEdits are per transaction (aka per row). And we do ship waledits,
> so at the slave, it will be end of some transaction the master have
> seen?
> Idea is that the atomicity is a waledit. Can we have such a scenario
> where end state does not correspond to the state at the end of any of
> the row transactions in the master? It will be good to know.
> Thanks.
> On Fri, Apr 13, 2012 at 4:54 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
>> Hey Lars,
>> in a DR scenario (i.e. a DC falls into the ocean) we SLAs that allow for a certain
amount of data loss.
>> The main concern here would be that "rows" could be in a state that does not correspond
to the state at the end of any of the row transactions in the source system, right?
>> Or are you referring to even cross table consistency?
>> -- Lars
>> ----- Original Message -----
>> From: Lars George <lars.george@gmail.com>
>> To: dev@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com>
>> Cc:
>> Sent: Thursday, April 12, 2012 11:13 PM
>> Subject: Re: HBase Replication use cases
>> Hi Lars,
>> I am really curious how you will handle the possible (or say likely) inconsistencies
between regions of the same table in case of a DR situation. This seems to be solely applications
layer logic but on the other hand a lot of people will need something here. So the question
is, could this be added to the code? The idea is, could we hint to the replication what schema
we are using and it can therefore handle shipping the logs somewhat "transactional" on the
receiving end? For example, it could record sequence IDs or even timestamps and when the originating
cluster fails there is a mechanism on the receiving end that deletes all inconsistent changes,
bringing it back to a well known checkpoint. The replication does ship the WAL edits so, this
might be all that is needed, and some ZooKeeper magic there to synchronize the checkpoint
across the region servers?
>> Maybe I am seeing this wrong here, but how else would you recover in the case of
a DR situation?
>> Cheers,
>> Lars
>> On Apr 12, 2012, at 11:50 PM, lars hofhansl wrote:
>>> Thanks Himanshu,
>>> we're planning to use Replication for cross DC replication for DR (and we added
a bunch of stuff and fixed bugs in replication).
>>> We'll have it always on (and only use stop/start_peer, which is new in 0.94+
to temporarily stop replication, rather than stop/start_replication)
>>> HBASE-2611 is a problem. We did not have time recently to work on this.
>>> i) and ii) can be worked around by forcing a log roll on all region servers after
replication was enabled. Replication would be considered started after the logs were
>>> rolled... But that is quite annoying.
>>> Is iii) still a problem in 0.92+? I thought we fixed that together with a).
>>> -- Lars
>>> ________________________________
>>> From: Himanshu Vashishtha <hvashish@cs.ualberta.ca>
>>> To: dev@hbase.apache.org
>>> Sent: Thursday, April 12, 2012 12:11 PM
>>> Subject: HBase Replication use cases
>>> Hello All,
>>> I have been doing testing on the HBase replication (0.90.4, and 0.92 variants).
>>> Here are some of the findings:
>>> a) 0.90+ is not that great in handling out znode changes; in an
>>> ongoing replication, if I delete a peer and a region server goes to
>>> the znode to update the log status, the region server aborts itself
>>> when it sees a missing znode.
>>> Recoverable Zookeeper seems to have fix this in 0.92+?
>>> 0.92 has lot of new features (start/stop handle, master master, cyclic).
>>> But there are corner cases with the start/stop switches.
>>> i)  A log is en-queue when the replication state is set to true. When we
>>> start the cluster, it is true and the starting region server takes the
>>> new log into the queue. If I do a stop_replication, and there is a log
>>> roll, and then I do a start_replication, the current log will not be
>>> replicated, as it has missed the opportunity of being added to the queue.
>>> ii) If I _start_ a region server when the replication state is set to
>>> false, its log will not be added to the queue. Now, if I do a
>>> start_replication, its log will not be replicated.
>>> iii) Removing a peer doesn't result in master region server abort, but
>>> in case of zk is down and there is a log roll, it will abort. Not a
>>> serious one as zk is down so the cluster is not healthy anyway.
>>> I was looking for jiras (including 2611), and stumbled upon 2223. I
>>> don't think there is any thing like time based partition behavior (as
>>> mentioned in the jira description). Though. the patch has lot of other
>>> nice things which indeed are in existing code. Please correct me if I
>>> miss  anything.
>>> Having said that, I wonder about other folks out there use it.
>>> Their experience, common issues (minor + major) they come across.
>>> I did find a ppt by Jean Daniel at oscon mentioning about using it in
>>> SU production.
>>> I plan to file jiras for the above ones and will start digging in.
>>> Look forward for your responses.
>>> Thanks,
>>> Himanshu

View raw message