Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CE125109AF for ; Fri, 20 Sep 2013 22:27:13 +0000 (UTC) Received: (qmail 97427 invoked by uid 500); 20 Sep 2013 22:27:06 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 97224 invoked by uid 500); 20 Sep 2013 22:27:04 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 96992 invoked by uid 99); 20 Sep 2013 22:27:02 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Sep 2013 22:27:02 +0000 Date: Fri, 20 Sep 2013 22:27:02 +0000 (UTC) From: "Hudson (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-7634) Replication handling of changes to peer clusters is inefficient MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773522#comment-13773522 ] Hudson commented on HBASE-7634: ------------------------------- FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #747 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/747/]) HBASE-9594 Add reference documentation on changes made by HBASE-7634 (Replication handling of peer cluster changes) (stack: rev 1525110) * /hbase/trunk/src/main/site/xdoc/replication.xml > Replication handling of changes to peer clusters is inefficient > --------------------------------------------------------------- > > Key: HBASE-7634 > URL: https://issues.apache.org/jira/browse/HBASE-7634 > Project: HBase > Issue Type: Bug > Components: Replication > Affects Versions: 0.95.2 > Reporter: Gabriel Reid > Assignee: Gabriel Reid > Fix For: 0.98.0, 0.95.2 > > Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch, HBASE-7634.v6.patch > > > The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. > This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). > Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. > A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira