Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 00D46200CA7 for ; Wed, 31 May 2017 04:47:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id F38C1160BDD; Wed, 31 May 2017 02:47:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 45424160BC9 for ; Wed, 31 May 2017 04:47:07 +0200 (CEST) Received: (qmail 39373 invoked by uid 500); 31 May 2017 02:47:06 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 39362 invoked by uid 99); 31 May 2017 02:47:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 May 2017 02:47:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id E6224C061B for ; Wed, 31 May 2017 02:47:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id FcKgKK5oGL2Q for ; Wed, 31 May 2017 02:47:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id CC75D5F47A for ; Wed, 31 May 2017 02:47:04 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 5C7E4E0C0F for ; Wed, 31 May 2017 02:47:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 222AC21B58 for ; Wed, 31 May 2017 02:47:04 +0000 (UTC) Date: Wed, 31 May 2017 02:47:04 +0000 (UTC) From: "Yu Li (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-18131) Add an hbase shell command to clear deadserver list in ServerManager MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 31 May 2017 02:47:08 -0000 [ https://issues.apache.org/jira/browse/HBASE-18131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030535#comment-16030535 ] Yu Li commented on HBASE-18131: ------------------------------- bq. I think the root cause of this is not that servers are in the dead-servers list indefinitely Possibly, but the server will be left in dead-servers list (before master restarts) if it's stopped (like for hardware repair) on purpose, right? So I think the new command will still be a good tool in such scenario? Thanks. > Add an hbase shell command to clear deadserver list in ServerManager > -------------------------------------------------------------------- > > Key: HBASE-18131 > URL: https://issues.apache.org/jira/browse/HBASE-18131 > Project: HBase > Issue Type: New Feature > Components: Operability > Reporter: Yu Li > Assignee: Yu Li > Fix For: 2.0.0, 1.4.0 > > > Currently if a regionserver is aborted due to fatal error or stopped by operator on purpose, it will be added into {{ServerManager#deadservers}} list and shown as "Dead Servers" in the master UI. This is a valid warn for operators to notice the self-aborted servers and give a sanity check to avoid further issues. However, after necessary checks, even if operator is sure that the node is decommissioned (such as for repair), there's no way to clear the dead server list except restarting master. See more details in [this discussion|http://mail-archives.apache.org/mod_mbox/hbase-user/201705.mbox/%3CCAM7-19%2BD4MLu2b1R94%2BtWQDspjfny2sCy4Qit8JtCgjvTOZzzg%40mail.gmail.com%3E] in mail list > Here we propose to add a hbase shell command to allow clearing dead server list in {{ServerManager}} for advanced users, and the command should be executed with caution. -- This message was sent by Atlassian JIRA (v6.3.15#6346)