hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pankaj Kumar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-16805) HMaster may send reportForDuty himself while shutting down
Date Tue, 11 Oct 2016 03:20:20 GMT
Pankaj Kumar created HBASE-16805:
------------------------------------

             Summary: HMaster may send reportForDuty himself while shutting down
                 Key: HBASE-16805
                 URL: https://issues.apache.org/jira/browse/HBASE-16805
             Project: HBase
          Issue Type: Bug
          Components: master
            Reporter: Pankaj Kumar
            Assignee: Pankaj Kumar
            Priority: Minor


We met an interesting scenario where HMaster had sent reportForDuty to himself during shutting
down. 

Initially HMaster had registered himself as active master, but couldn't finish its initialization
as Namespace table was not assigned due to some reason within the specified time,
{noformat}
2016-07-30 19:36:52,161 | FATAL | hadoopc1h2:21300.activeMasterManager | Failed to become
active master | org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1610)
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
	at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:102)
	at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:977)
	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:763)
	at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
	at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1606)
	at java.lang.Thread.run(Thread.java:745)
2016-07-30 19:36:52,162 | FATAL | hadoopc1h2:21300.activeMasterManager | Master server abort:
loaded coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController, org.apache.hadoop.hbase.index.coprocessor.master.IndexMasterObserver,
org.apache.hadoop.hbase.JMXListener] | org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1981)
2016-07-30 19:36:52,162 | FATAL | hadoopc1h2:21300.activeMasterManager | Unhandled exception.
Starting shutdown. | org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1984)
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
	at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:102)
	at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:977)
	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:763)
	at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
	at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1606)
	at java.lang.Thread.run(Thread.java:745)
2016-07-30 19:36:52,187 | INFO  | master/hadoopc1h2/172.16.19.51:21300 | reportForDuty to
master=hadoopc1h2,21300,1469877905979 with port=21300, startcode=1469877905979 | org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2271)
2016-07-30 19:36:52,198 | INFO  | hadoopc1h2:21300.activeMasterManager | ConnectorServer stopped!
| org.apache.hadoop.hbase.JMXListener.stopConnectorServer(JMXListener.java:160)
{noformat}
Above in the second last line, HMaster sent reportForDuty to himself.


Background:
1) During master startup HMasterCommandLine constructs the HMaster which starts another thread
which is waiting to become active,
{code}
	startActiveMasterManager(infoPort);
{code}
 
2) Same time after constructing HMaster, HMasterCommandLine started the HMaster thread, 
{code}
	 HMaster master = HMaster.constructMaster(masterClass, conf, csm);
        if (master.isStopped()) {
          LOG.info("Won't bring the Master up as a shutdown is requested");
          return 1;
        }
        master.start();
        master.join();
{code}
which will be waiting at below code flow,
{noformat}
	HRegionServer
		run()
		   preRegistrationInitialization()
		      initializeZooKeeper()
			waitForMasterActive()
{noformat}

3) In HMaster,
{code}
  protected void waitForMasterActive(){
    boolean tablesOnMaster = BaseLoadBalancer.tablesOnMaster(conf);
    while (!(tablesOnMaster && isActiveMaster)
        && !isStopped() && !isAborted()) {
      sleeper.sleep();
    }
  }
{code}
HMaster will wait here until it is stopped/aborted as "hbase.balancer.tablesOnMaster" is not
configured.


When HMaster failed to complete its initialization (as Namespace table was not assigned) then
it will be abort,
{noformat}
	abort("Unhandled exception. Starting shutdown.", t);
{noformat}

So step-2 thread will not wait anymore on HMaster abort and while processing further it will
send send report to active master.
{code}
      // Try and register with the Master; tell it we are here.  Break if
      // server is stopped or the clusterup flag is down or hdfs went wacky.
      while (keepLooping()) {
        RegionServerStartupResponse w = reportForDuty();
        if (w == null) {
          LOG.warn("reportForDuty failed; sleeping and then retrying.");
          this.sleeper.sleep();
        } else {
          handleReportForDutyResponse(w);
          break;
        }
      }
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message