hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5916) RS restart just before master intialization we make the cluster non operative
Date Thu, 03 May 2012 05:34:02 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267205#comment-13267205
] 

ramkrishna.s.vasudevan commented on HBASE-5916:
-----------------------------------------------

Yes the RS is slow in checking in.
The problem here is the master will do the split of the newly checked in RS as he is the RS
that is not there in the online list.
{code}
Set<ServerName> onlineServers = new HashSet<ServerName>(serverManager
        .getOnlineServers().keySet());
    // TODO: Should do this in background rather than block master startup
    status.setStatus("Splitting logs after master startup");
    splitLogAfterStartup(this.fileSystemManager, onlineServers);
{code}
To split if i find any log folder which is not belonging to any of those in 'onlineServers'(the
online list is already got) we will call split log and finally delete the log folder.  So
though the server is online i will not be able to use the hLog and i get filenotfoundException.
Sorry if am not clear.

                
> RS restart just before master intialization we make the cluster non operative
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-5916
>                 URL: https://issues.apache.org/jira/browse/HBASE-5916
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1, 0.94.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.94.1
>
>
> Consider a case where my master is getting restarted.  RS that was alive when the master
restart started, gets restarted before the master initializes the ServerShutDownHandler.
> {code}
> serverShutdownHandlerEnabled = true;
> {code}
> In this case when the RS tries to register with the master, the master will try to expire
the server but the server cannot be expired as still the serverShutdownHandler is not enabled.
> This case may happen when i have only one RS gets restarted or all the RS gets restarted
at the same time.(before assignRootandMeta).
> {code}
> LOG.info(message);
>       if (existingServer.getStartcode() < serverName.getStartcode()) {
>         LOG.info("Triggering server recovery; existingServer " +
>           existingServer + " looks stale, new server:" + serverName);
>         expireServer(existingServer);
>       }
> {code}
> If another RS is brought up then the cluster comes back to normalcy.
> May be a very corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message