hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samir Ahmic (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7386) Investigate providing some supervisor support for znode deletion
Date Wed, 12 Jul 2017 21:32:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084730#comment-16084730
] 

Samir Ahmic commented on HBASE-7386:
------------------------------------

[~stack] i have done some testing with last patches against master branch and good news is
that most of code(with small changes) and functionality works fine.  So original idea to improve
MTTR by removing stale master and rs znodes plus watchdog which will restart process in case
of unexpected failure is still valid.
My original scripts here are written with idea to be optional route in managing hbase processes
using supervisor, and that approach opens couple of questions which i would like to discuss:
# Amount of code added and options to reduce it (i will anyway try to reduce it to minimum)
probably some code can be integrated in exiting scripts to avoid copying
# Where are we going to add new scripts supervisord folder inside bin dir was may original
idea and same thing goes for config files supervisord folder in conf dir
# Testing: i will cover supervisor 3.3.2 version(last stable) and some older version that
are installed trough system packet manages
# And finally would it be better to implement our own Java supervisor which would do similar
thing as python implementation 

Based on what we decide i will continue work here, if we go with python supervisor i can have
patch ready for testing in couple of days. 

> Investigate providing some supervisor support for znode deletion
> ----------------------------------------------------------------
>
>                 Key: HBASE-7386
>                 URL: https://issues.apache.org/jira/browse/HBASE-7386
>             Project: HBase
>          Issue Type: Task
>          Components: master, regionserver, scripts
>            Reporter: Gregory Chanan
>            Assignee: stack
>            Priority: Blocker
>         Attachments: HBASE-7386-bin.patch, HBASE-7386-bin-v2.patch, HBASE-7386-bin-v3.patch,
HBASE-7386-conf.patch, HBASE-7386-conf-v2.patch, HBASE-7386-conf-v3.patch, HBASE-7386-src.patch,
HBASE-7386-v0.patch, supervisordconfigs-v0.patch
>
>
> There a couple of JIRAs for deleting the znode on a process failure:
> HBASE-5844 (RS)
> HBASE-5926 (Master)
> which are pretty neat; on process failure, they delete the znode of the underlying process
so HBase can recover faster.
> These JIRAs were implemented via the startup scripts; i.e. the script hangs around and
waits for the process to exit, then deletes the znode.
> There are a few problems associated with this approach, as listed in the below JIRAs:
> 1) Hides startup output in script
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
> 2) two hbase processes listed per launched daemon
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
> 3) Not run by a real supervisor
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
> 4) Weird output after kill -9 actual process in standalone mode
> https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
> 5) Can kill existing RS if called again
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
> 6) Hides stdout/stderr[6]
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
> I suspect running in via something like supervisor.d can solve these issues if we provide
the right support.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message