hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3280) YouAreDeadException being swallowed in HRS getMaster()
Date Mon, 29 Nov 2010 16:21:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964799#action_12964799
] 

Jonathan Gray commented on HBASE-3280:
--------------------------------------

Somewhat related to this, what happened on a cluster here is that the HRS got stuck in this
loop trying to reconnect to master and ignoring the YouAreDeadExceptions.  But then once the
master finished shutdown handling, it removes this server from the dead server list.  Then
the RS actually successfully heartbeated in to the master and the master thought it was a
legit RS (even though it just finished doing a shutdown of it).

Is there a reason we should ever clear things out of the dead server list?  If this RS is
in a network partition it may not check back with the master for a long time so we should
always remember the dead serverNames (which include start codes)?

> YouAreDeadException being swallowed in HRS getMaster()
> ------------------------------------------------------
>
>                 Key: HBASE-3280
>                 URL: https://issues.apache.org/jira/browse/HBASE-3280
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.90.0, 0.92.0
>
>
> In the HRS, when we lose our connection to the master, we enter into a loop where we
keep trying to get the new master location in ZK and attempt to send our heartbeat.  Within
tryRegionServerReport() we could get a YouAreDeadException, but we won't let it out.  This
leads to the RS continuously heartbeating in to the master although the master keeps telling
it to kill itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message