hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Knome (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3580) Remove RS from DeadServer when new instance checks in
Date Fri, 04 Mar 2011 20:49:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002806#comment-13002806
] 

Ian Knome commented on HBASE-3580:
----------------------------------

jdcryans: Thanks for the review comments. I had already figured out that some thing was not
quite right and prepared the next version of patch that addresses your concerns.

I have attached the patch and the only thing that I am not sure about is: 

This patch assumes that the server name in the deadServer set is always going to be in the
format of 
<serverName,port,startcode> and wondering whether we should handle cases where it could
also be 
<serverName:port>?

Please let me know and I will be more than happy to address that as well.

> Remove RS from DeadServer when new instance checks in
> -----------------------------------------------------
>
>                 Key: HBASE-3580
>                 URL: https://issues.apache.org/jira/browse/HBASE-3580
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.0
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.90.2
>
>         Attachments: HBASE-3580-Remove-RS-from-DeadServer-when-new-instance-checks-in.patch,
HBASE-3580_-_Remove_RS_from_dead_server_when_the_RS_when_new_instance_checks_in3.patch
>
>
> Keeping the servers in DeadServer until it reaches some maximum isn't super friendly,
it confuses even the best of our users:
> {quote}
> 09:27 < gbowyer> Hi all, I have apparently three dead RS in my cluster, I cannot
find references to them in HDFS or in ZK, how do I still report dead RS
> 09:27 < gbowyer> also the same nodes are reported as live region servers
> {quote}
> The subtil startcode difference can be hard to catch, also this behavior differs from
0.20 (so old users get confused, like I did when debugging this problem) and it also differs
from Hadoop's handling of dead DataNodes. It was introduced in HBASE-3282.
> I think this should be improved by doing like Hadoop does, removing the RS from DeadServers
when a new instance with the same hostname+port checks in. Stack says we should do it in ServerManager.checkIsDead

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message