hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
Date Mon, 11 Mar 2013 06:53:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598579#comment-13598579
] 

Jeffrey Zhong commented on HBASE-7709:
--------------------------------------

Continue with more proposals...

The disadvantages of option#2 is obvious as its advantages. Even in cases(maybe majority replication
usage cases), there is no loop at all and just a long replication queue. The downstream RSs
still need to replay and store a long list of clusterIds for each WALEdit. Encoding may help
compress the clusterId list in sending part but not in storing.

Let me firstly try to show you if we can do better than option#2 and then an alternative way
which is good in most cases without more storage need. Both options are good IMHO.

As we know loop is caused by back-edge in graph. We can roughly identify them by the fact
if a region server sees there are more than one path from same source. If that's the case,
loop situation is likely. Only by then, we need to append current cluster Id to the source
cluster Id of a WAL edit for later loop detection. Therefore, in most cases, we don't need
store long clusterId list if there is no loop or a simple master-master-master… cycle setup.

I called the above updated option#2 as adaptive option#2 where it only need more storage when
there is a need. We can implement it as following:

1) Maintain a hash string PathCheckum(= Hash(receivedPathChecksum + current clusterId)) of
a WAL edit 
2) Each replaying & receiving region server maintains an internal memory ClusterDistanceMap
<clusterId, Set<PathChecksums seen so far>>. 
  2.a Every time if it sees a new PathChecksum(which isn't in Set<PathChecksums> ),
it add the new PathChecksum into Set<PathChecksums> or drop a stale one from  Set<PathChecksums>
when it's expired, i.e. after a configurable time period, a region server doesn't see any
data coming in from the path.
3) When Set<PathChecksums>'s size > 1, append current cluster id into the WAL edit
for later replication loop detection.

We can use top 8 bytes of clusterId to store PathChecksum and the rest 8 bytes as the hash
of the original cluserId value. After the update, we only need to pay cost when there is a
need. 


While you can image in real life replication setup normally doesn't involve any complicated
graph, the option#2 is using extra storage need to deal with situations most likely won't
happen. Therefore, in the following, I want to propose a solution without changing current
WAL format and is good for most cases including the situation triggering the JIRA. In extreme
cases, it reports errors for infinite loop. 

The new proposal(option #6) is as following:
1) Maintain a hash string PathCheckum(= Hash(receivedPathChecksum + current clusterId)) of
a WAL edit 
2) Each replaying & receiving region server maintains an internal memory ClusterDistanceMap
<clusterId, Set<PathChecksums seen so far>>. 
  2.a Every time if it sees a new PathChecksum(which isn't in Set<PathChecksums> ),
it add the new PathChecksum into Set<PathChecksums> or drop a stale one from  Set<PathChecksums>
when it's expired, i.e. after a configurable time period, a region server doesn't any data
coming in from the path.
3) When Set<PathChecksums>'s size > 1, reset a WAL edit's clusterId to current clusterId
and increment a counter(ResetCounter) to mark how many times current WAL edit's clusterId
has been reset.
4) When ResetCounter > 64, reports error( we could drop WAL edits as well because when
ResetCounter > 64, it means we have at least 64 back-edges or duplicated sources. I think
it's way complicated to have such cases.)

The advantage of the above option is possibly using existing HLog format to prevent possible
loop situation in real life cases

To implement,
1) we can introduce a new version(3) in HLogKey
2) use top 7 bytes of UUID to store PathChecksum, use the following 1 byte to store RD and
the remaining 8 bytes as a hash value of the 16 bytes length of origin UUID value without
compromising uniqueness because in most cases we have 10s clusters involved in replication
and the collision probability is less than 10(-18)
3) we can introduce a configuration setting with default to false(suggested by Lars). After
we rollout the feature, we can turn it on and turn if off in revert scenario.

Thanks,
-Jeffrey

                
> Infinite loop possible in Master/Master replication
> ---------------------------------------------------
>
>                 Key: HBASE-7709
>                 URL: https://issues.apache.org/jira/browse/HBASE-7709
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.95.0, 0.94.6
>            Reporter: Lars Hofhansl
>             Fix For: 0.95.0, 0.94.7
>
>
> We just discovered the following scenario:
> # Cluster A and B are setup in master/master replication
> # By accident we had Cluster C replicate to Cluster A.
> Now all edit originating from C will be bouncing between A and B. Forever!
> The reason is that when the edit come in from C the cluster ID is already set and won't
be reset.
> We have a couple of options here:
> # Optionally only support master/master (not cycles of more than two clusters). In that
case we can always reset the cluster ID in the ReplicationSource. That means that now cycles
> 2 will have the data cycle forever. This is the only option that requires no changes
in the HLog format.
> # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that
have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already.
The is the cleanest approach, but it might need a lot of data stored per edit if there are
many clusters involved.
> # Maintain a configurable counter of the maximum cycle side we want to support. Could
default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource
increases that hop-count on each hop. If we're over the max, just drop the edit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message