hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-19726) Failed to start HMaster due to infinite retrying on meta assign
Date Sun, 07 Jan 2018 13:26:00 GMT
Duo Zhang created HBASE-19726:
---------------------------------

             Summary: Failed to start HMaster due to infinite retrying on meta assign
                 Key: HBASE-19726
                 URL: https://issues.apache.org/jira/browse/HBASE-19726
             Project: HBase
          Issue Type: Bug
            Reporter: Duo Zhang


This is what I got at first, an exception when trying to write something to meta when meta
has not been onlined yet.

{noformat}
2018-01-07,21:03:14,389 INFO org.apache.hadoop.hbase.master.HMaster: Running RecoverMetaProcedure
to ensure proper hbase:meta deploy.
2018-01-07,21:03:14,637 INFO org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure:
Start pid=1, state=RUNNABLE:RECOVER_META_SPLIT_LOGS; RecoverMetaProcedure failedMetaServer=null,
splitWal=true
2018-01-07,21:03:14,645 INFO org.apache.hadoop.hbase.master.MasterWalManager: Log folder hdfs://c402tst-community/hbase/c402tst-community/WALs/c4-hadoop-tst-st27.bj,38900,1515330173896
belongs to an existing region server
2018-01-07,21:03:14,646 INFO org.apache.hadoop.hbase.master.MasterWalManager: Log folder hdfs://c402tst-community/hbase/c402tst-community/WALs/c4-hadoop-tst-st29.bj,38900,1515330177232
belongs to an existing region server
2018-01-07,21:03:14,648 INFO org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure:
pid=1, state=RUNNABLE:RECOVER_META_ASSIGN_REGIONS; RecoverMetaProcedure failedMetaServer=null,
splitWal=true; Retaining meta assignment to server=null
2018-01-07,21:03:14,653 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
subprocedures=[{pid=2, ppid=1, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta,
region=1588230740}]
2018-01-07,21:03:14,660 INFO org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler:
pid=2, ppid=1, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, region=1588230740
hbase:meta hbase:meta,,1.1588230740
2018-01-07,21:03:14,663 INFO org.apache.hadoop.hbase.master.assignment.AssignProcedure: Start
pid=2, ppid=1, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, region=1588230740;
rit=OFFLINE, location=null; forceNewPlan=false, retain=false
2018-01-07,21:03:14,831 INFO org.apache.hadoop.hbase.zookeeper.MetaTableLocator: Setting hbase:meta
(replicaId=0) location in ZooKeeper as c4-hadoop-tst-st27.bj,38900,1515330173896
2018-01-07,21:03:14,841 INFO org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure:
Dispatch pid=2, ppid=1, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=hbase:meta,
region=1588230740; rit=OPENING, location=c4-hadoop-tst-st27.bj,38900,1515330173896
2018-01-07,21:03:14,992 INFO org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher:
Using procedure batch rpc execution for serverName=c4-hadoop-tst-st27.bj,38900,1515330173896
version=3145728
2018-01-07,21:03:15,593 ERROR org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: Cannot
get replica 0 location for {"totalColumns":1,"row":"hbase:meta","families":{"table":[{"qualifier":"state","vlen":2,"tag":[],"timestamp":1515330195514}]},"ts":1515330195514}
2018-01-07,21:03:15,594 WARN org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure:
Retryable error trying to transition: pid=2, ppid=1, state=RUNNABLE:REGION_TRANSITION_FINISH;
AssignProcedure table=hbase:meta, region=1588230740; rit=OPEN, location=c4-hadoop-tst-st27.bj,38900,1515330173896
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: IOException:
1 time, servers with issues: null
        at org.apache.hadoop.hbase.client.BatchErrors.makeException(BatchErrors.java:54)
        at org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.getErrors(AsyncRequestFutureImpl.java:1250)
        at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:457)
        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:570)
        at org.apache.hadoop.hbase.MetaTableAccessor.put(MetaTableAccessor.java:1450)
        at org.apache.hadoop.hbase.MetaTableAccessor.putToMetaTable(MetaTableAccessor.java:1439)
        at org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1785)
        at org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1151)
        at org.apache.hadoop.hbase.master.TableStateManager.udpateMetaState(TableStateManager.java:183)
        at org.apache.hadoop.hbase.master.TableStateManager.setTableState(TableStateManager.java:69)
        at org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsOpened(AssignmentManager.java:1515)
        at org.apache.hadoop.hbase.master.assignment.AssignProcedure.finishTransition(AssignProcedure.java:271)
        at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:320)
        at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:86)
        at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1456)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1225)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1735)
{noformat}

And then I got repeated exception like this infinitely
{noformat}
2018-01-07,21:03:15,596 WARN org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure:
Retryable error trying to transition: pid=2, ppid=1, state=RUNNABLE:REGION_TRANSITION_FINISH;
AssignProcedure table=hbase:meta, region=1588230740; rit=OPEN, location=c4-hadoop-tst-st27.bj,38900,1515330173896
org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected [OFFLINE, CLOSED, SPLITTING,
SPLIT, OPENING, FAILED_OPEN] so could move to OPEN but current state=OPEN
        at org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:155)
        at org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsOpened(AssignmentManager.java:1513)
        at org.apache.hadoop.hbase.master.assignment.AssignProcedure.finishTransition(AssignProcedure.java:271)
        at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:320)
        at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:86)
        at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1456)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1225)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1735)
{noformat}

This is a bit strange. Since we are assigning meta, why we need to write the state to meta
table?

I checked the code a bit.
In AssignProcedure.finishTransition, we will do this
{code}
   env.getAssignmentManager().markRegionAsOpened(regionNode);
{code}
And in AssignmentManager.markRegionAsOpened, we will do this
{code}
      if (isMetaRegion(hri)) {
        master.getTableStateManager().setTableState(TableName.META_TABLE_NAME,
            TableState.State.ENABLED);
        setMetaInitialized(hri, true);
      }
{code}

And in TableStateManager.setTableState, we will call udpateMetaState(a typo...) to write something
to meta.

I think this will lead to a dead lock? I do not think we need to put the state of meta table
to meta table? It is always enabled...

But I do not know why it worked when I tried to restart the cluster... Maybe we do not enter
this code path for a non-fresh cluster?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message