Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C89A9200B46 for ; Fri, 1 Jul 2016 09:49:12 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C759B160A5D; Fri, 1 Jul 2016 07:49:12 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 43B7D160A61 for ; Fri, 1 Jul 2016 09:49:12 +0200 (CEST) Received: (qmail 17113 invoked by uid 500); 1 Jul 2016 07:49:11 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 17015 invoked by uid 99); 1 Jul 2016 07:49:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jul 2016 07:49:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 330662C02A9 for ; Fri, 1 Jul 2016 07:49:11 +0000 (UTC) Date: Fri, 1 Jul 2016 07:49:11 +0000 (UTC) From: "Phil Yang (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-16144) Replication queue's lock will live forever if RS acquiring the lock has died prematurely MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 01 Jul 2016 07:49:12 -0000 [ https://issues.apache.org/jira/browse/HBASE-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358591#comment-15358591 ] Phil Yang commented on HBASE-16144: ----------------------------------- If the RS get "session expired", RecoverableZooKeeper will try to reconnect instead of crash itself. If we use ephemeral node for lock, after reconnect there is no lock so more than one RS will copy the queue. In other words, if ephemeral node disappeared, we can not say the server must have died. > Replication queue's lock will live forever if RS acquiring the lock has died prematurely > ---------------------------------------------------------------------------------------- > > Key: HBASE-16144 > URL: https://issues.apache.org/jira/browse/HBASE-16144 > Project: HBase > Issue Type: Bug > Affects Versions: 1.2.1, 1.1.5, 0.98.20 > Reporter: Phil Yang > Assignee: Phil Yang > Attachments: HBASE-16144-v1.patch, HBASE-16144-v2.patch > > > In default, we will use multi operation when we claimQueues from ZK. But if we set hbase.zookeeper.useMulti=false, we will add a lock first, then copy nodes, finally clean old queue and the lock. > However, if the RS acquiring the lock crash before claimQueues done, the lock will always be there and other RS can never claim the queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)