Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 680BD200CCB for ; Thu, 20 Jul 2017 22:02:17 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 6653516BFE8; Thu, 20 Jul 2017 20:02:17 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AB6C716BFE6 for ; Thu, 20 Jul 2017 22:02:16 +0200 (CEST) Received: (qmail 71273 invoked by uid 500); 20 Jul 2017 20:02:15 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 71262 invoked by uid 99); 20 Jul 2017 20:02:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Jul 2017 20:02:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 50D43C35A6 for ; Thu, 20 Jul 2017 20:02:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.201 X-Spam-Level: X-Spam-Status: No, score=-99.201 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id gIPs8zPoJGop for ; Thu, 20 Jul 2017 20:02:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 73C9D60D14 for ; Thu, 20 Jul 2017 20:02:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 5F9A7E0E1B for ; Thu, 20 Jul 2017 20:02:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 6A69521ED8 for ; Thu, 20 Jul 2017 20:02:00 +0000 (UTC) Date: Thu, 20 Jul 2017 20:02:00 +0000 (UTC) From: "Samir Ahmic (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-7386) Investigate providing some supervisor support for znode deletion MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 20 Jul 2017 20:02:17 -0000 [ https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samir Ahmic updated HBASE-7386: ------------------------------- Attachment: HBASE-7386-master-00.patch Here is first fully operational patch for running hbase processes under python supervisor control. Patch creates new bin/supervisord and conf/supervisord directories first dir contains supporting scripts for managing cluster (start/stop/restart/check) and second one supervisor config files. In order to test this you will need to instal python supervisor (3.3.2) usually with "pip install supervisor==3.3.2". No additional steps are required, configure your hbase as usual and go to /bin/supervisor and run ./start-supervisord-hbase.sh. There is also python script zk_cleaner.py which acts as process event listener in charge to remove mater/rs znode when process is in stoping or exit state. This is first version of patch and whole idea will need more testing and code polishing, all comments and suggestions are welcome. > Investigate providing some supervisor support for znode deletion > ---------------------------------------------------------------- > > Key: HBASE-7386 > URL: https://issues.apache.org/jira/browse/HBASE-7386 > Project: HBase > Issue Type: Task > Components: master, regionserver, scripts > Reporter: Gregory Chanan > Assignee: stack > Priority: Blocker > Fix For: 3.0.0 > > Attachments: HBASE-7386-bin.patch, HBASE-7386-bin-v2.patch, HBASE-7386-bin-v3.patch, HBASE-7386-conf.patch, HBASE-7386-conf-v2.patch, HBASE-7386-conf-v3.patch, HBASE-7386-master-00.patch, HBASE-7386-src.patch, HBASE-7386-v0.patch, supervisordconfigs-v0.patch > > > There a couple of JIRAs for deleting the znode on a process failure: > HBASE-5844 (RS) > HBASE-5926 (Master) > which are pretty neat; on process failure, they delete the znode of the underlying process so HBase can recover faster. > These JIRAs were implemented via the startup scripts; i.e. the script hangs around and waits for the process to exit, then deletes the znode. > There are a few problems associated with this approach, as listed in the below JIRAs: > 1) Hides startup output in script > https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 > 2) two hbase processes listed per launched daemon > https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 > 3) Not run by a real supervisor > https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409 > 4) Weird output after kill -9 actual process in standalone mode > https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801 > 5) Can kill existing RS if called again > https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401 > 6) Hides stdout/stderr[6] > https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832 > I suspect running in via something like supervisor.d can solve these issues if we provide the right support. -- This message was sent by Atlassian JIRA (v6.4.14#64029)