hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-8910) HMaster.abortNow shouldn't try to become a master again if it was stopped
Date Wed, 10 Jul 2013 17:25:49 GMT

     [ https://issues.apache.org/jira/browse/HBASE-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jean-Daniel Cryans updated HBASE-8910:
--------------------------------------

    Summary: HMaster.abortNow shouldn't try to become a master again if it was stopped  (was:
TestReplicationDisableInactivePeer fails if the master we shutdown comes back to life)

Thanks Stack. I'm changing the title to reflect the origin of the problem, as it seems it
could happen not just during unit tests but also when doing a normal shutdown of a master.
                
> HMaster.abortNow shouldn't try to become a master again if it was stopped
> -------------------------------------------------------------------------
>
>                 Key: HBASE-8910
>                 URL: https://issues.apache.org/jira/browse/HBASE-8910
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.98.0, 0.95.2, 0.94.10
>
>         Attachments: HBASE-8910.patch
>
>
> Here's a case where TestReplicationDisableInactivePeer fails while re-starting the second
master:
> http://54.241.6.143/job/HBase-0.95-Hadoop-2/574/org.apache.hbase$hbase-server/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationDisableInactivePeer/testDisableInactivePeer/
> The reason is that when we first shutdown the master, it comes back to life thinking
it just lost its session:
> {noformat}
> 2013-07-07 04:27:03,989 FATAL [pool-1-thread-1-EventThread] master.HMaster(2062): Master
server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
> 2013-07-07 04:27:03,989 INFO  [pool-1-thread-1-EventThread] master.HMaster(2155): Primary
Master trying to recover from ZooKeeper session expiry.
> {noformat}
> And after that it tries to assign .META. fails since the RS are down.
> One way I think we can prevent this is by skipping recovering the session if we are stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message