hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesse Yates (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5075) regionserver crashed and failover
Date Sun, 26 Feb 2012 03:55:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216625#comment-13216625
] 

Jesse Yates commented on HBASE-5075:
------------------------------------

Had the same concerns about the network IO and (I think) blocking call. However, with the
shutdown hook, I think we can be _more sure_ that it runs, rather than putting it after the
run method. Also, the hooks run in their own thread, so on shutdown, its not going to block
regular shutdown or any other synchronous operations.

Granted, this doesn't deal with the kill -9 or network partition situation, but if that happens,
you have some big problems anyways and a minute (or whatever your zk timeout is) of blocking
probably isn't a big deal ;) Also note, that in the latter case there, the daemon wouldn't
be able to reach zk anyways to eliminate the node, so you are back to where you were before.

                
> regionserver crashed and failover
> ---------------------------------
>
>                 Key: HBASE-5075
>                 URL: https://issues.apache.org/jira/browse/HBASE-5075
>             Project: HBase
>          Issue Type: Improvement
>          Components: monitoring, regionserver, replication, zookeeper
>    Affects Versions: 0.92.1
>            Reporter: zhiyuan.dai
>             Fix For: 0.90.5
>
>         Attachments: Degion of Failure Detection.pdf, HBase-5075-shell.patch, HBase-5075-src.patch
>
>
> regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's
shutdown,it is long time to fetch the hlog's lease.
> hbase is a online db, availability is very important.
> i have a idea to improve availability, monitor node to check regionserver's pid.if this
pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file.
> so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message