Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 31 May 2017 02:47:04 +0000 (UTC)
From: "Yu Li (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.13075805.1496143931000.327209.1496198824137@Atlassian.JIRA>
In-Reply-To: <JIRA.13075805.1496143931000@Atlassian.JIRA>
References: <JIRA.13075805.1496143931000@Atlassian.JIRA> <JIRA.13075805.1496143931795@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HBASE-18131) Add an hbase shell command to
 clear deadserver list in ServerManager
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 31 May 2017 02:47:08 -0000


    [ https://issues.apache.org/jira/browse/HBASE-18131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030535#comment-16030535 ] 

Yu Li commented on HBASE-18131:
-------------------------------

bq. I think the root cause of this is not that servers are in the dead-servers list indefinitely
Possibly, but the server will be left in dead-servers list (before master restarts) if it's stopped (like for hardware repair) on purpose, right? So I think the new command will still be a good tool in such scenario? Thanks.

> Add an hbase shell command to clear deadserver list in ServerManager
> --------------------------------------------------------------------
>
>                 Key: HBASE-18131
>                 URL: https://issues.apache.org/jira/browse/HBASE-18131
>             Project: HBase
>          Issue Type: New Feature
>          Components: Operability
>            Reporter: Yu Li
>            Assignee: Yu Li
>             Fix For: 2.0.0, 1.4.0
>
>
> Currently if a regionserver is aborted due to fatal error or stopped by operator on purpose, it will be added into {{ServerManager#deadservers}} list and shown as "Dead Servers" in the master UI. This is a valid warn for operators to  notice the self-aborted servers and give a sanity check to avoid further issues. However, after necessary checks, even if operator is sure that the node is decommissioned (such as for repair), there's no way to clear the dead server list except restarting master. See more details in [this discussion|http://mail-archives.apache.org/mod_mbox/hbase-user/201705.mbox/%3CCAM7-19%2BD4MLu2b1R94%2BtWQDspjfny2sCy4Qit8JtCgjvTOZzzg%40mail.gmail.com%3E] in mail list
> Here we propose to add a hbase shell command to allow clearing dead server list in {{ServerManager}} for advanced users, and the command should be executed with caution.


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)