hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-19726) Failed to start HMaster due to infinite retrying on meta assign
Date Sat, 03 Feb 2018 02:16:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-19726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351197#comment-16351197
] 

Hadoop QA commented on HBASE-19726:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 32s{color} | {color:blue}
Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  0s{color} | {color:blue}
Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  0s{color}
| {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  0s{color} | {color:red}
The patch doesn't appear to include any new or modified tests. Please justify why no new tests
are needed for this patch. Also please list what manual steps were performed to verify this
patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m  6s{color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 40s{color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 55s{color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 59s{color}
| {color:green} branch has no errors when building our shaded downstream artifacts. {color}
|
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 23s{color} |
{color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 52s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 36s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 36s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 51s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m  4s{color}
| {color:green} patch has no errors when building our shaded downstream artifacts. {color}
|
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 16m 18s{color}
| {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color}
|
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 25s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}103m 34s{color} | {color:green}
hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 15s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}138m 43s{color} | {color:black}
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19726 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12909044/19726.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  hadoopcheck  hbaseanti
 checkstyle  compile  |
| uname | Linux f54c041d6b59 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016
x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
|
| git revision | master / 8143d5afa4 |
| maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z)
|
| Default Java | 1.8.0_151 |
|  Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11367/testReport/ |
| Max. process+thread count | 5094 (vs. ulimit of 10000) |
| modules | C: hbase-server U: hbase-server |
| Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11367/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Failed to start HMaster due to infinite retrying on meta assign
> ---------------------------------------------------------------
>
>                 Key: HBASE-19726
>                 URL: https://issues.apache.org/jira/browse/HBASE-19726
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Duo Zhang
>            Assignee: stack
>            Priority: Major
>             Fix For: 2.0.0-beta-2
>
>         Attachments: 19726.patch
>
>
> This is what I got at first, an exception when trying to write something to meta when
meta has not been onlined yet.
> {noformat}
> 2018-01-07,21:03:14,389 INFO org.apache.hadoop.hbase.master.HMaster: Running RecoverMetaProcedure
to ensure proper hbase:meta deploy.
> 2018-01-07,21:03:14,637 INFO org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure:
Start pid=1, state=RUNNABLE:RECOVER_META_SPLIT_LOGS; RecoverMetaProcedure failedMetaServer=null,
splitWal=true
> 2018-01-07,21:03:14,645 INFO org.apache.hadoop.hbase.master.MasterWalManager: Log folder
hdfs://c402tst-community/hbase/c402tst-community/WALs/c4-hadoop-tst-st27.bj,38900,1515330173896
belongs to an existing region server
> 2018-01-07,21:03:14,646 INFO org.apache.hadoop.hbase.master.MasterWalManager: Log folder
hdfs://c402tst-community/hbase/c402tst-community/WALs/c4-hadoop-tst-st29.bj,38900,1515330177232
belongs to an existing region server
> 2018-01-07,21:03:14,648 INFO org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure:
pid=1, state=RUNNABLE:RECOVER_META_ASSIGN_REGIONS; RecoverMetaProcedure failedMetaServer=null,
splitWal=true; Retaining meta assignment to server=null
> 2018-01-07,21:03:14,653 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
subprocedures=[{pid=2, ppid=1, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta,
region=1588230740}]
> 2018-01-07,21:03:14,660 INFO org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler:
pid=2, ppid=1, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, region=1588230740
hbase:meta hbase:meta,,1.1588230740
> 2018-01-07,21:03:14,663 INFO org.apache.hadoop.hbase.master.assignment.AssignProcedure:
Start pid=2, ppid=1, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta,
region=1588230740; rit=OFFLINE, location=null; forceNewPlan=false, retain=false
> 2018-01-07,21:03:14,831 INFO org.apache.hadoop.hbase.zookeeper.MetaTableLocator: Setting
hbase:meta (replicaId=0) location in ZooKeeper as c4-hadoop-tst-st27.bj,38900,1515330173896
> 2018-01-07,21:03:14,841 INFO org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure:
Dispatch pid=2, ppid=1, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=hbase:meta,
region=1588230740; rit=OPENING, location=c4-hadoop-tst-st27.bj,38900,1515330173896
> 2018-01-07,21:03:14,992 INFO org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher:
Using procedure batch rpc execution for serverName=c4-hadoop-tst-st27.bj,38900,1515330173896
version=3145728
> 2018-01-07,21:03:15,593 ERROR org.apache.hadoop.hbase.client.AsyncRequestFutureImpl:
Cannot get replica 0 location for {"totalColumns":1,"row":"hbase:meta","families":{"table":[{"qualifier":"state","vlen":2,"tag":[],"timestamp":1515330195514}]},"ts":1515330195514}
> 2018-01-07,21:03:15,594 WARN org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure:
Retryable error trying to transition: pid=2, ppid=1, state=RUNNABLE:REGION_TRANSITION_FINISH;
AssignProcedure table=hbase:meta, region=1588230740; rit=OPEN, location=c4-hadoop-tst-st27.bj,38900,1515330173896
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action:
IOException: 1 time, servers with issues: null
>         at org.apache.hadoop.hbase.client.BatchErrors.makeException(BatchErrors.java:54)
>         at org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.getErrors(AsyncRequestFutureImpl.java:1250)
>         at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:457)
>         at org.apache.hadoop.hbase.client.HTable.put(HTable.java:570)
>         at org.apache.hadoop.hbase.MetaTableAccessor.put(MetaTableAccessor.java:1450)
>         at org.apache.hadoop.hbase.MetaTableAccessor.putToMetaTable(MetaTableAccessor.java:1439)
>         at org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1785)
>         at org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1151)
>         at org.apache.hadoop.hbase.master.TableStateManager.udpateMetaState(TableStateManager.java:183)
>         at org.apache.hadoop.hbase.master.TableStateManager.setTableState(TableStateManager.java:69)
>         at org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsOpened(AssignmentManager.java:1515)
>         at org.apache.hadoop.hbase.master.assignment.AssignProcedure.finishTransition(AssignProcedure.java:271)
>         at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:320)
>         at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:86)
>         at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1456)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1225)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1735)
> {noformat}
> And then I got repeated exception like this infinitely
> {noformat}
> 2018-01-07,21:03:15,596 WARN org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure:
Retryable error trying to transition: pid=2, ppid=1, state=RUNNABLE:REGION_TRANSITION_FINISH;
AssignProcedure table=hbase:meta, region=1588230740; rit=OPEN, location=c4-hadoop-tst-st27.bj,38900,1515330173896
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected [OFFLINE, CLOSED,
SPLITTING, SPLIT, OPENING, FAILED_OPEN] so could move to OPEN but current state=OPEN
>         at org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:155)
>         at org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsOpened(AssignmentManager.java:1513)
>         at org.apache.hadoop.hbase.master.assignment.AssignProcedure.finishTransition(AssignProcedure.java:271)
>         at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:320)
>         at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:86)
>         at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1456)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1225)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1735)
> {noformat}
> This is a bit strange. Since we are assigning meta, why we need to write the state to
meta table?
> I checked the code a bit.
> In AssignProcedure.finishTransition, we will do this
> {code}
>    env.getAssignmentManager().markRegionAsOpened(regionNode);
> {code}
> And in AssignmentManager.markRegionAsOpened, we will do this
> {code}
>       if (isMetaRegion(hri)) {
>         master.getTableStateManager().setTableState(TableName.META_TABLE_NAME,
>             TableState.State.ENABLED);
>         setMetaInitialized(hri, true);
>       }
> {code}
> And in TableStateManager.setTableState, we will call udpateMetaState(a typo...) to write
something to meta.
> I think this will lead to a dead lock? I do not think we need to put the state of meta
table to meta table? It is always enabled...
> But I do not know why it worked when I tried to restart the cluster... Maybe we do not
enter this code path for a non-fresh cluster?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message