hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Loknath Priyatham Teja Singamsetty (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15924) Enhance hbase services autorestart capability to hbase-daemon.sh
Date Wed, 19 Oct 2016 08:34:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588054#comment-15588054
] 

Loknath Priyatham Teja Singamsetty  commented on HBASE-15924:
-------------------------------------------------------------

[~apurtell] 

{quote}
One option is to make a PID file of the supervisor, check if the supervisor PID file exists
and is valid, if so then send a signal to the supervisor to terminate it, then terminate the
child under watch.
{quote}

The autostart works by placing a file like this regionserver.autostart under HBASE_PID_DIR.
As soon as stop is issued, it first removes this file so that autostart doesn't work anymore.

{quote}
In another test, I started the regionserver with ./bin/hbase-daemon.sh --autostart-window-retry-limit
3 autostart regionserver and in another SSH session then attempted to stop the regionserver
with ./bin/hbase-daemon.sh stop regionserver. This appears to work, although I can see by
tailing the regionserver log output file that the regionserver process is partially restarted
and rapidly killed.
{quote}

This didn't occur to me. Please provide the repro steps for the same. Assuming that you are
not using sfdc packages as in case of internal packages, we have a diff mechanism using cron
which starts the process when killed. Kindly do re-check if you are testing on any of our
internal clusters where cron based autorestart is already enabled.

Also note that added minor enhancement to wait for 20 sec after the hmaster/regionserver process
is killed in ungraceful manner. This will help for any shutdown hook to be executed before
the start command is triggered by autostart.

Attached new patch.

 

> Enhance hbase services autorestart capability to hbase-daemon.sh 
> -----------------------------------------------------------------
>
>                 Key: HBASE-15924
>                 URL: https://issues.apache.org/jira/browse/HBASE-15924
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.98.19
>            Reporter: Loknath Priyatham Teja Singamsetty 
>            Assignee: Loknath Priyatham Teja Singamsetty 
>             Fix For: 0.98.24
>
>         Attachments: HBASE-15924.master.0001.patch, HBASE-15924.master.0002.patch, HBASE-15924.master.0003.patch
>
>
> As part of HBASE-5939, the autorestart for hbase services has been added to deal with
scenarios where hbase services (master/regionserver/master-backup) gets killed or goes down
leading to unplanned outages. The changes were made to hbase-daemon.sh to support autorestart
option. 
> However, the autorestart implementation doesn't work in standalone mode and other than
that have few gaps with the implementation as per release notes of HBASE-5939. Here is an
attempt to re-design and fix the functionality considering all possible usecases with hbase
service operations.
> Release Notes of HBASE-5939:
> ------------------------------------------
> When launched with autorestart, HBase processes will automatically restart if they are
not properly terminated, either by a "stop" command or by a cluster stop. To ensure that it
does not overload the system when the server itself is corrupted and the process cannot be
restarted, the server sleeps for 5 minutes before restarting if it was already started 5 minutes
ago previously. To use it, launch the process with "bin/start-hbase autorestart". This option
is not fully compatible with the existing "restart" command: if you ask for a restart on a
server launched with autorestart, the server will restart but the next server instance won't
be automatically restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message