hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3580) Remove RS from DeadServer when new instance checks in
Date Fri, 04 Mar 2011 00:17:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002348#comment-13002348

Jean-Daniel Cryans commented on HBASE-3580:

In DeadServer.remove, don't decrement numprocessing since it's already done way before that.

The new DeadServer.isDeadServerComingBackAlive doesn't look right. It says the server name
can be passed as either form, but then tells HSI.isServer that it's passing hostAndPortOnly.

In ServerManager, I don't think you should use 2 methods... it looks more confusing than it
should be:

+    if (!IsDeadServerComingBackAlive(serverName)) {
+      if (!this.deadservers.isDeadServer(serverName)) return;

BTW don't use upper case for the first letter of the first word in the method name.

Also this won't work:


Since it's the full server name (host,port,startcode) and we check if it's not dead, then
it definitely shouldn't be in there!

What should be done instead is checking if the full servername is in DeadServer, if not then
also check if its host+port form is in there and if it is then delete it.

> Remove RS from DeadServer when new instance checks in
> -----------------------------------------------------
>                 Key: HBASE-3580
>                 URL: https://issues.apache.org/jira/browse/HBASE-3580
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.0
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.90.2
>         Attachments: HBASE-3580-Remove-RS-from-DeadServer-when-new-instance-checks-in.patch
> Keeping the servers in DeadServer until it reaches some maximum isn't super friendly,
it confuses even the best of our users:
> {quote}
> 09:27 < gbowyer> Hi all, I have apparently three dead RS in my cluster, I cannot
find references to them in HDFS or in ZK, how do I still report dead RS
> 09:27 < gbowyer> also the same nodes are reported as live region servers
> {quote}
> The subtil startcode difference can be hard to catch, also this behavior differs from
0.20 (so old users get confused, like I did when debugging this problem) and it also differs
from Hadoop's handling of dead DataNodes. It was introduced in HBASE-3282.
> I think this should be improved by doing like Hadoop does, removing the RS from DeadServers
when a new instance with the same hostname+port checks in. Stack says we should do it in ServerManager.checkIsDead

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message