Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5E715187DB for ; Thu, 29 Oct 2015 02:37:28 +0000 (UTC) Received: (qmail 541 invoked by uid 500); 29 Oct 2015 02:37:28 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 487 invoked by uid 500); 29 Oct 2015 02:37:28 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 455 invoked by uid 99); 29 Oct 2015 02:37:28 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Oct 2015 02:37:28 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id D54B42C1F62 for ; Thu, 29 Oct 2015 02:37:27 +0000 (UTC) Date: Thu, 29 Oct 2015 02:37:27 +0000 (UTC) From: "Hudson (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-12769) Replication fails to delete all corresponding zk nodes when peer is removed MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979690#comment-14979690 ] Hudson commented on HBASE-12769: -------------------------------- FAILURE: Integrated in HBase-1.3-IT #279 (See [https://builds.apache.org/job/HBase-1.3-IT/279/]) HBASE-12769 Replication fails to delete all corresponding zk nodes when (tedyu: rev 3c8e92019ce87bba3d7c99342bf626c2076f24ac) * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java * hbase-client/src/main/java/org/apache/hadoop/hbase/client/replication/ReplicationAdmin.java * hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeersZKImpl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java * hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationFactory.java * hbase-server/src/test/java/org/apache/hadoop/hbase/client/replication/TestReplicationAdmin.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java * hbase-server/src/main/java/org/apache/hadoop/hbase/util/hbck/ReplicationChecker.java > Replication fails to delete all corresponding zk nodes when peer is removed > --------------------------------------------------------------------------- > > Key: HBASE-12769 > URL: https://issues.apache.org/jira/browse/HBASE-12769 > Project: HBase > Issue Type: Improvement > Components: Replication > Affects Versions: 0.99.2 > Reporter: Jianwei Cui > Assignee: Jianwei Cui > Priority: Minor > Fix For: 2.0.0, 1.3.0 > > Attachments: 12769-branch-1-v5.txt, 12769-v2.txt, 12769-v3.txt, 12769-v4.txt, 12769-v5.txt, 12769-v6.txt, HBASE-12769-trunk-v0.patch, HBASE-12769-trunk-v1.patch > > > When removing a peer, the client side will delete peerId under peersZNode node; then alive region servers will be notified and delete corresponding hlog queues under its rsZNode of replication. However, if there are failed servers whose hlog queues have not been transferred by alive servers(this likely happens if setting a big value to "replication.sleep.before.failover" and lots of region servers restarted), these hlog queues won't be deleted after the peer is removed. I think remove_peer should guarantee all corresponding zk nodes have been removed after it completes; otherwise, if we create a new peer with the same peerId with the removed one, there might be unexpected data to be replicated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)