hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhangduo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13194) TableNamespaceManager not ready cause MasterQuotaManager initialization fail
Date Wed, 11 Mar 2015 13:22:38 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356854#comment-14356854
] 

zhangduo commented on HBASE-13194:
----------------------------------

Seems the problem is here.
{noformat}
2015-03-10 22:42:01,337 INFO  [MASTER_SERVER_OPERATIONS-hemera:48616-0] handler.ServerShutdownHandler(186):
Mark regions in recovery for crashed server hemera.apache.org,36185,1426027305449 before assignment;
regions=[{ENCODED => 969aa3ccca0a77c1d68f296b93b2d064, NAME => 'hbase:namespace,,1426027307874.969aa3ccca0a77c1d68f296b93b2d064.',
STARTKEY => '', ENDKEY => ''}]
2015-03-10 22:42:01,338 DEBUG [MASTER_SERVER_OPERATIONS-hemera:48616-0] zookeeper.ZKUtil(745):
master:48616-0x14c05d9d745000b, quorum=localhost:63193, baseZNode=/hbase Unable to get data
of znode /hbase/recovering-regions/969aa3ccca0a77c1d68f296b93b2d064 because node does not
exist (not an error)
2015-03-10 22:42:01,351 INFO  [hemera:48616.activeMasterManager] master.AssignmentManager(416):
Joined the cluster in 69ms, failover=true
2015-03-10 22:42:01,360 DEBUG [MASTER_SERVER_OPERATIONS-hemera:48616-0] coordination.ZKSplitLogManagerCoordination(650):
Marked 969aa3ccca0a77c1d68f296b93b2d064 as recovering from hemera.apache.org,36185,1426027305449:
/hbase/recovering-regions/969aa3ccca0a77c1d68f296b93b2d064/hemera.apache.org,36185,1426027305449
2015-03-10 22:42:01,360 DEBUG [MASTER_SERVER_OPERATIONS-hemera:48616-0] master.RegionStates(492):
Adding to processed servers hemera.apache.org,36185,1426027305449
2015-03-10 22:42:01,360 INFO  [MASTER_SERVER_OPERATIONS-hemera:48616-0] master.RegionStates(1074):
Transition {969aa3ccca0a77c1d68f296b93b2d064 state=OPEN, ts=1426027321326, server=hemera.apache.org,36185,1426027305449}
to {969aa3ccca0a77c1d68f296b93b2d064 state=OFFLINE, ts=1426027321360, server=hemera.apache.org,36185,1426027305449}
2015-03-10 22:42:01,361 INFO  [MASTER_SERVER_OPERATIONS-hemera:48616-0] master.RegionStateStore(207):
Updating row hbase:namespace,,1426027307874.969aa3ccca0a77c1d68f296b93b2d064. with state=OFFLINE
2015-03-10 22:42:01,369 INFO  [MASTER_SERVER_OPERATIONS-hemera:48616-0] handler.ServerShutdownHandler(218):
Reassigning 1 region(s) that hemera.apache.org,36185,1426027305449 was carrying (and 0 regions(s)
that were opening on this server)
{noformat}
HMaster is also a RegionServer which carries system table regions. And when restarting, seems
the system region state is OPEN until we begin to recover it? So we pass the check isTableAssigned
check in TableNamespaceManager.start, but the following calls to isTableAvailableAndInitialized
are all failed because we just begin to recover it and the region state is transited to OFFLINE.
Not sure why this happen, I think the state should not be OPEN when HMaster started. Will
go on tomorrow.

> TableNamespaceManager not ready cause MasterQuotaManager initialization fail 
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-13194
>                 URL: https://issues.apache.org/jira/browse/HBASE-13194
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: zhangduo
>
> This cause TestNamespaceAuditor to fail.
> https://builds.apache.org/job/HBase-TRUNK/6237/testReport/junit/org.apache.hadoop.hbase.namespace/TestNamespaceAuditor/testRegionOperations/
> {noformat}
> 2015-03-10 22:42:01,372 ERROR [hemera:48616.activeMasterManager] namespace.NamespaceStateManager(204):
Error while update namespace state.
> java.io.IOException: Table Namespace Manager not ready yet, try again later
> 	at org.apache.hadoop.hbase.master.HMaster.checkNamespaceManagerReady(HMaster.java:1912)
> 	at org.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:2131)
> 	at org.apache.hadoop.hbase.namespace.NamespaceStateManager.initialize(NamespaceStateManager.java:188)
> 	at org.apache.hadoop.hbase.namespace.NamespaceStateManager.start(NamespaceStateManager.java:63)
> 	at org.apache.hadoop.hbase.namespace.NamespaceAuditor.start(NamespaceAuditor.java:57)
> 	at org.apache.hadoop.hbase.quotas.MasterQuotaManager.start(MasterQuotaManager.java:88)
> 	at org.apache.hadoop.hbase.master.HMaster.initQuotaManager(HMaster.java:902)
> 	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:756)
> 	at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:161)
> 	at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1455)
> 	at java.lang.Thread.run(Thread.java:744)
> {noformat}
> The direct reason is that we do not have a retry here, if init fails then it always fails.
But I skimmed the code, seems there is no async init operations when calling finishActiveMasterInitialization,
so it is very strange. Need to dig more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message