hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pankaj Kumar (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-16805) HMaster may send reportForDuty himself while shutting down
Date Tue, 11 Oct 2016 03:23:20 GMT

     [ https://issues.apache.org/jira/browse/HBASE-16805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pankaj Kumar updated HBASE-16805:
---------------------------------
    Description: 
We met an interesting scenario where HMaster had sent reportForDuty to himself during shutting
down. 

Initially HMaster had registered himself as active master, but couldn't finish its initialization
as Namespace table was not assigned due to some reason within the specified time,
{noformat}
2016-07-30 19:36:52,161 | FATAL | hadoopc1h2:21300.activeMasterManager | Failed to become
active master | org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1610)
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
	at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:102)
	at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:977)
	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:763)
	at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
	at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1606)
	at java.lang.Thread.run(Thread.java:745)
2016-07-30 19:36:52,162 | FATAL | hadoopc1h2:21300.activeMasterManager | Master server abort:
loaded coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController, org.apache.hadoop.hbase.index.coprocessor.master.IndexMasterObserver,
org.apache.hadoop.hbase.JMXListener] | org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1981)
2016-07-30 19:36:52,162 | FATAL | hadoopc1h2:21300.activeMasterManager | Unhandled exception.
Starting shutdown. | org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1984)
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
	at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:102)
	at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:977)
	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:763)
	at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
	at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1606)
	at java.lang.Thread.run(Thread.java:745)
2016-07-30 19:36:52,187 | INFO  | master/hadoopc1h2/machine-ip:21300 | reportForDuty to master=hadoopc1h2,21300,1469877905979
with port=21300, startcode=1469877905979 | org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2271)
2016-07-30 19:36:52,198 | INFO  | hadoopc1h2:21300.activeMasterManager | ConnectorServer stopped!
| org.apache.hadoop.hbase.JMXListener.stopConnectorServer(JMXListener.java:160)
{noformat}
Above in the second last line, HMaster sent reportForDuty to himself.


Background:
1) During master startup HMasterCommandLine constructs the HMaster which starts another thread
which is waiting to become active,
{code}
	startActiveMasterManager(infoPort);
{code}
 
2) Same time after constructing HMaster, HMasterCommandLine started the HMaster thread, 
{code}
	 HMaster master = HMaster.constructMaster(masterClass, conf, csm);
        if (master.isStopped()) {
          LOG.info("Won't bring the Master up as a shutdown is requested");
          return 1;
        }
        master.start();
        master.join();
{code}
which will be waiting at below code flow,
{noformat}
	HRegionServer
		run()
		   preRegistrationInitialization()
		      initializeZooKeeper()
			waitForMasterActive()
{noformat}

3) In HMaster,
{code}
  protected void waitForMasterActive(){
    boolean tablesOnMaster = BaseLoadBalancer.tablesOnMaster(conf);
    while (!(tablesOnMaster && isActiveMaster)
        && !isStopped() && !isAborted()) {
      sleeper.sleep();
    }
  }
{code}
HMaster will wait here until it is stopped/aborted as "hbase.balancer.tablesOnMaster" is not
configured.


When HMaster failed to complete its initialization (as Namespace table was not assigned) then
it will be abort,
{noformat}
	abort("Unhandled exception. Starting shutdown.", t);
{noformat}

So step-2 thread will not wait anymore on HMaster abort and while processing further it will
send send report to active master.
{code}
      // Try and register with the Master; tell it we are here.  Break if
      // server is stopped or the clusterup flag is down or hdfs went wacky.
      while (keepLooping()) {
        RegionServerStartupResponse w = reportForDuty();
        if (w == null) {
          LOG.warn("reportForDuty failed; sleeping and then retrying.");
          this.sleeper.sleep();
        } else {
          handleReportForDutyResponse(w);
          break;
        }
      }
{code}


  was:
We met an interesting scenario where HMaster had sent reportForDuty to himself during shutting
down. 

Initially HMaster had registered himself as active master, but couldn't finish its initialization
as Namespace table was not assigned due to some reason within the specified time,
{noformat}
2016-07-30 19:36:52,161 | FATAL | hadoopc1h2:21300.activeMasterManager | Failed to become
active master | org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1610)
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
	at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:102)
	at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:977)
	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:763)
	at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
	at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1606)
	at java.lang.Thread.run(Thread.java:745)
2016-07-30 19:36:52,162 | FATAL | hadoopc1h2:21300.activeMasterManager | Master server abort:
loaded coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController, org.apache.hadoop.hbase.index.coprocessor.master.IndexMasterObserver,
org.apache.hadoop.hbase.JMXListener] | org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1981)
2016-07-30 19:36:52,162 | FATAL | hadoopc1h2:21300.activeMasterManager | Unhandled exception.
Starting shutdown. | org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1984)
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
	at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:102)
	at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:977)
	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:763)
	at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
	at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1606)
	at java.lang.Thread.run(Thread.java:745)
2016-07-30 19:36:52,187 | INFO  | master/hadoopc1h2/172.16.19.51:21300 | reportForDuty to
master=hadoopc1h2,21300,1469877905979 with port=21300, startcode=1469877905979 | org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2271)
2016-07-30 19:36:52,198 | INFO  | hadoopc1h2:21300.activeMasterManager | ConnectorServer stopped!
| org.apache.hadoop.hbase.JMXListener.stopConnectorServer(JMXListener.java:160)
{noformat}
Above in the second last line, HMaster sent reportForDuty to himself.


Background:
1) During master startup HMasterCommandLine constructs the HMaster which starts another thread
which is waiting to become active,
{code}
	startActiveMasterManager(infoPort);
{code}
 
2) Same time after constructing HMaster, HMasterCommandLine started the HMaster thread, 
{code}
	 HMaster master = HMaster.constructMaster(masterClass, conf, csm);
        if (master.isStopped()) {
          LOG.info("Won't bring the Master up as a shutdown is requested");
          return 1;
        }
        master.start();
        master.join();
{code}
which will be waiting at below code flow,
{noformat}
	HRegionServer
		run()
		   preRegistrationInitialization()
		      initializeZooKeeper()
			waitForMasterActive()
{noformat}

3) In HMaster,
{code}
  protected void waitForMasterActive(){
    boolean tablesOnMaster = BaseLoadBalancer.tablesOnMaster(conf);
    while (!(tablesOnMaster && isActiveMaster)
        && !isStopped() && !isAborted()) {
      sleeper.sleep();
    }
  }
{code}
HMaster will wait here until it is stopped/aborted as "hbase.balancer.tablesOnMaster" is not
configured.


When HMaster failed to complete its initialization (as Namespace table was not assigned) then
it will be abort,
{noformat}
	abort("Unhandled exception. Starting shutdown.", t);
{noformat}

So step-2 thread will not wait anymore on HMaster abort and while processing further it will
send send report to active master.
{code}
      // Try and register with the Master; tell it we are here.  Break if
      // server is stopped or the clusterup flag is down or hdfs went wacky.
      while (keepLooping()) {
        RegionServerStartupResponse w = reportForDuty();
        if (w == null) {
          LOG.warn("reportForDuty failed; sleeping and then retrying.");
          this.sleeper.sleep();
        } else {
          handleReportForDutyResponse(w);
          break;
        }
      }
{code}



> HMaster may send reportForDuty himself while shutting down
> ----------------------------------------------------------
>
>                 Key: HBASE-16805
>                 URL: https://issues.apache.org/jira/browse/HBASE-16805
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: Pankaj Kumar
>            Assignee: Pankaj Kumar
>            Priority: Minor
>
> We met an interesting scenario where HMaster had sent reportForDuty to himself during
shutting down. 
> Initially HMaster had registered himself as active master, but couldn't finish its initialization
as Namespace table was not assigned due to some reason within the specified time,
> {noformat}
> 2016-07-30 19:36:52,161 | FATAL | hadoopc1h2:21300.activeMasterManager | Failed to become
active master | org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1610)
> java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
> 	at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:102)
> 	at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:977)
> 	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:763)
> 	at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
> 	at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1606)
> 	at java.lang.Thread.run(Thread.java:745)
> 2016-07-30 19:36:52,162 | FATAL | hadoopc1h2:21300.activeMasterManager | Master server
abort: loaded coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController,
org.apache.hadoop.hbase.index.coprocessor.master.IndexMasterObserver, org.apache.hadoop.hbase.JMXListener]
| org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1981)
> 2016-07-30 19:36:52,162 | FATAL | hadoopc1h2:21300.activeMasterManager | Unhandled exception.
Starting shutdown. | org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1984)
> java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
> 	at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:102)
> 	at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:977)
> 	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:763)
> 	at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
> 	at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1606)
> 	at java.lang.Thread.run(Thread.java:745)
> 2016-07-30 19:36:52,187 | INFO  | master/hadoopc1h2/machine-ip:21300 | reportForDuty
to master=hadoopc1h2,21300,1469877905979 with port=21300, startcode=1469877905979 | org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2271)
> 2016-07-30 19:36:52,198 | INFO  | hadoopc1h2:21300.activeMasterManager | ConnectorServer
stopped! | org.apache.hadoop.hbase.JMXListener.stopConnectorServer(JMXListener.java:160)
> {noformat}
> Above in the second last line, HMaster sent reportForDuty to himself.
> Background:
> 1) During master startup HMasterCommandLine constructs the HMaster which starts another
thread which is waiting to become active,
> {code}
> 	startActiveMasterManager(infoPort);
> {code}
>  
> 2) Same time after constructing HMaster, HMasterCommandLine started the HMaster thread,

> {code}
> 	 HMaster master = HMaster.constructMaster(masterClass, conf, csm);
>         if (master.isStopped()) {
>           LOG.info("Won't bring the Master up as a shutdown is requested");
>           return 1;
>         }
>         master.start();
>         master.join();
> {code}
> which will be waiting at below code flow,
> {noformat}
> 	HRegionServer
> 		run()
> 		   preRegistrationInitialization()
> 		      initializeZooKeeper()
> 			waitForMasterActive()
> {noformat}
> 3) In HMaster,
> {code}
>   protected void waitForMasterActive(){
>     boolean tablesOnMaster = BaseLoadBalancer.tablesOnMaster(conf);
>     while (!(tablesOnMaster && isActiveMaster)
>         && !isStopped() && !isAborted()) {
>       sleeper.sleep();
>     }
>   }
> {code}
> HMaster will wait here until it is stopped/aborted as "hbase.balancer.tablesOnMaster"
is not configured.
> When HMaster failed to complete its initialization (as Namespace table was not assigned)
then it will be abort,
> {noformat}
> 	abort("Unhandled exception. Starting shutdown.", t);
> {noformat}
> So step-2 thread will not wait anymore on HMaster abort and while processing further
it will send send report to active master.
> {code}
>       // Try and register with the Master; tell it we are here.  Break if
>       // server is stopped or the clusterup flag is down or hdfs went wacky.
>       while (keepLooping()) {
>         RegionServerStartupResponse w = reportForDuty();
>         if (w == null) {
>           LOG.warn("reportForDuty failed; sleeping and then retrying.");
>           this.sleeper.sleep();
>         } else {
>           handleReportForDutyResponse(w);
>           break;
>         }
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message