hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14000) Region server failed to report Master and stuck in reportForDuty retry loop
Date Tue, 07 Jul 2015 19:14:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617210#comment-14617210

Jerry He commented on HBASE-14000:

In HBASE-13317, we try to be conservative if the region sever gets ServerNotRunningYetException
when reportForDuty.
ServerNotRunningYetException means the master may still be initializing, so there may not
be an immediate need to try a new RPC connection.

In your case, do you see the loop stuck for a long time, meaning that the old master continued
to return ServerNotRunningYetException for a long time? 

> Region server failed to report Master and stuck in reportForDuty retry loop
> ---------------------------------------------------------------------------
>                 Key: HBASE-14000
>                 URL: https://issues.apache.org/jira/browse/HBASE-14000
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Pankaj Kumar
>            Assignee: Pankaj Kumar
>         Attachments: HBASE-14000.patch
> In a HA cluster, region server got stuck in reportForDuty retry loop if the active master
is restarting and later on master switch happens before it reports successfully.
> Root cause is same as HBASE-13317, but the region server tried to connect master when
it was starting, so rssStub reset didnt happen as
> {code}
>   if (ioe instanceof ServerNotRunningYetException) {
> 	LOG.debug("Master is not running yet");
>   }
> {code}
> When master starts, master switch happened. So RS always tried to connect to standby

This message was sent by Atlassian JIRA

View raw message