Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Fri, 1 Jul 2016 07:49:11 +0000 (UTC)
From: "Phil Yang (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12984790.1467183338000.2565.1467359351205@Atlassian.JIRA>
In-Reply-To: <JIRA.12984790.1467183338000@Atlassian.JIRA>
References: <JIRA.12984790.1467183338000@Atlassian.JIRA> <JIRA.12984790.1467183338859@arcas>
Subject: [jira] [Commented] (HBASE-16144) Replication queue's lock will live
 forever if RS acquiring the lock has died prematurely
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Fri, 01 Jul 2016 07:49:12 -0000


    [ https://issues.apache.org/jira/browse/HBASE-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358591#comment-15358591 ] 

Phil Yang commented on HBASE-16144:
-----------------------------------

If the RS get "session expired", RecoverableZooKeeper will try to reconnect instead of crash itself. If we use ephemeral node for lock, after reconnect there is no lock so more than one RS will copy the queue. In other words, if ephemeral node disappeared, we can not say the server must have died.

> Replication queue's lock will live forever if RS acquiring the lock has died prematurely
> ----------------------------------------------------------------------------------------
>
>                 Key: HBASE-16144
>                 URL: https://issues.apache.org/jira/browse/HBASE-16144
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.2.1, 1.1.5, 0.98.20
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>         Attachments: HBASE-16144-v1.patch, HBASE-16144-v2.patch
>
>
> In default, we will use multi operation when we claimQueues from ZK. But if we set hbase.zookeeper.useMulti=false, we will add a lock first, then copy nodes, finally clean old queue and the lock. 
> However, if the RS acquiring the lock crash before claimQueues done, the lock will always be there and other RS can never claim the queue.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)