Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1F97D200B14 for ; Fri, 3 Jun 2016 16:57:01 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1E516160A3B; Fri, 3 Jun 2016 14:57:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 79874160A49 for ; Fri, 3 Jun 2016 16:57:00 +0200 (CEST) Received: (qmail 62253 invoked by uid 500); 3 Jun 2016 14:56:59 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 62196 invoked by uid 99); 3 Jun 2016 14:56:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jun 2016 14:56:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6EDAC2C1F6E for ; Fri, 3 Jun 2016 14:56:59 +0000 (UTC) Date: Fri, 3 Jun 2016 14:56:59 +0000 (UTC) From: "Hudson (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-12769) Replication fails to delete all corresponding zk nodes when peer is removed MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 03 Jun 2016 14:57:01 -0000 [ https://issues.apache.org/jira/browse/HBASE-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314239#comment-15314239 ] Hudson commented on HBASE-12769: -------------------------------- SUCCESS: Integrated in HBase-1.3-IT #689 (See [https://builds.apache.org/job/HBase-1.3-IT/689/]) HBASE-15888 Extend HBASE-12769 for bulk load data replication (ashishsinghi: rev b0e1fdae346b64af4188cf5df29488617753416f) * hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeersZKImpl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/util/hbck/ReplicationChecker.java > Replication fails to delete all corresponding zk nodes when peer is removed > --------------------------------------------------------------------------- > > Key: HBASE-12769 > URL: https://issues.apache.org/jira/browse/HBASE-12769 > Project: HBase > Issue Type: Improvement > Components: Replication > Affects Versions: 0.99.2 > Reporter: Jianwei Cui > Assignee: Jianwei Cui > Priority: Minor > Fix For: 2.0.0, 1.3.0 > > Attachments: 12769-branch-1-v5.txt, 12769-v2.txt, 12769-v3.txt, 12769-v4.txt, 12769-v5.txt, 12769-v6.txt, HBASE-12769-trunk-v0.patch, HBASE-12769-trunk-v1.patch > > > When removing a peer, the client side will delete peerId under peersZNode node; then alive region servers will be notified and delete corresponding hlog queues under its rsZNode of replication. However, if there are failed servers whose hlog queues have not been transferred by alive servers(this likely happens if setting a big value to "replication.sleep.before.failover" and lots of region servers restarted), these hlog queues won't be deleted after the peer is removed. I think remove_peer should guarantee all corresponding zk nodes have been removed after it completes; otherwise, if we create a new peer with the same peerId with the removed one, there might be unexpected data to be replicated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)