hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Van Besien <ja...@ngdata.com>
Subject HBase replication: "in order semantics"
Date Fri, 09 Nov 2012 13:25:05 GMT
Hi,

I am trying to understand in detail how HBase replication works.

First of all, I assume that it is required for replication to work 
correct that all edits are replayed on the replica HBase cluster in the 
same order as they were executed on the source HBase cluster. Correct?

If so, I am trying to understand how that is guaranteed.

I can see that this is trivially true by reading the edits in the HLog, 
and using that as a source for replication.

However, what if a region is moved to another region server. Can we not 
end up in the following sitation?

1) region A is originally hosted by region server X.
2) replication in region server X is replicating edits of region A. Say 
that it is lagging behind a bit, so it has a number of edits still to do.
3) region A is moved to region server Y.
4) edits for region A arrive on region server Y, and replication on 
region server Y starts replicating them
5) replication in region server X is still busy with some left over 
edits from region A, so these are replicated out of order

So the question really is whether there is a mechanism to prevent the 
replication source from reading edits in a HLog for a region that was 
meanwhile already moved to another region server.

It could be that it has something to do with log splitting and recovery, 
but I was under the assumption that HBase only splits logs in case of 
recovery and/or master restart, and not in case of region moves.

I hope somebody can shed some light on this topic.

Thanks,
Jan

Mime
View raw message