hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5075) regionserver crashed and failover
Date Fri, 17 Feb 2012 16:39:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210367#comment-13210367
] 

stack commented on HBASE-5075:
------------------------------

Thanks for doing this.  It looks very interesting.

Please do not reformat existing code.  It bloats your patch and makes reviews take longer;
reviewer attention span is short (at least in this case) and its a shame to spend it going
over code reformats.

On the patch, is this necessary: +  public String getRSPidAndRsZknode();

Can't you get the pid from a process listing?  Or you want us to publish it via jmx?   Or
it looks like it is already published via jmx.  Can your tool pick it up there?  On the znode,
can't you get the regionserver servername and then do lookup in zk directly?

Can't you have supervisor do this?  Is there not existing utilities that watch a pid and allow
you do stuff when its gone?  Or is it that you'd kill the server if a long GC pause?

Do you have a bit of documentation on how this new utility works?

Thanks.
                
> regionserver crashed and failover
> ---------------------------------
>
>                 Key: HBASE-5075
>                 URL: https://issues.apache.org/jira/browse/HBASE-5075
>             Project: HBase
>          Issue Type: Improvement
>          Components: monitoring, regionserver, replication, zookeeper
>    Affects Versions: 0.92.1
>            Reporter: zhiyuan.dai
>             Fix For: 0.90.5
>
>         Attachments: 5075.patch
>
>
> regionserver crashed,it is too long time to notify hmaster.when hmaster know regionserver's
shutdown,it is long time to fetch the hlog's lease.
> hbase is a online db, availability is very important.
> i have a idea to improve availability, monitor node to check regionserver's pid.if this
pid not exsits,i think the rs down,i will delete the znode,and force close the hlog file.
> so the period maybe 100ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message