hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5844) Delete the region servers znode after a regions server crash
Date Wed, 26 Sep 2012 00:48:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463409#comment-13463409
] 

stack commented on HBASE-5844:
------------------------------

Looking at this w/ j-d, now we no longer do nohup so the parent process can stick around to
watch out for the server crash. This make it so now there are two  hbase processes listed
per launched daemon.  This is kinda ugly.

When we have this bash script watching the running java process we verge into the territory
normally occupied by babysitters like supervise.   Our parent bash script will always be less
than a real babysitter -- supervise, god, etc. -- so maybe we should just have this kill znode
as an optional script w/ prescription for how to set it up -- e.g. run znode remover on daemon
crash before starting new one (if we want supervise to start a new one).

I'm thinking we should back this out since there are open questions still.
                
> Delete the region servers znode after a regions server crash
> ------------------------------------------------------------
>
>                 Key: HBASE-5844
>                 URL: https://issues.apache.org/jira/browse/HBASE-5844
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>             Fix For: 0.96.0
>
>         Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 5844.v3.patch, 5844.v4.patch
>
>
> today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery
process will stop only after a timeout, usually 30s.
> By deleting the znode in start script, we remove this delay and the recovery starts immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message