Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 62FFAE9C2 for ; Tue, 29 Jan 2013 20:25:13 +0000 (UTC) Received: (qmail 67714 invoked by uid 500); 29 Jan 2013 20:25:13 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 67685 invoked by uid 500); 29 Jan 2013 20:25:13 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 67675 invoked by uid 99); 29 Jan 2013 20:25:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Jan 2013 20:25:13 +0000 Date: Tue, 29 Jan 2013 20:25:13 +0000 (UTC) From: "Ian Varley (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565760#comment-13565760 ] Ian Varley commented on HBASE-7709: ----------------------------------- (Because cycles > 2 are still fine, they just have to include all nodes. It can go A -> B -> C -> A; when an edit from A gets to C, it won't re-send to A, and the cycle will stop. The problem is just when it's a cycle from A -> (B -> C -> B).) > Infinite loop possible in Master/Master replication > --------------------------------------------------- > > Key: HBASE-7709 > URL: https://issues.apache.org/jira/browse/HBASE-7709 > Project: HBase > Issue Type: Bug > Components: Replication > Reporter: Lars Hofhansl > Fix For: 0.96.0, 0.94.6 > > > We just discovered the following scenario: > # Cluster A and B are setup in master/master replication > # By accident we had Cluster C replicate to Cluster A. > Now all edit originating from C will be bouncing between A and B. Forever! > The reason is that when the edit come in from C the cluster ID is already set and won't be reset. > We have a couple of options here: > # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles > 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. > # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. > # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira