hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-18408) AM consumes CPU and fills up the logs really fast when there is no RS to assign
Date Tue, 18 Jul 2017 23:49:00 GMT
Enis Soztutar created HBASE-18408:
-------------------------------------

             Summary: AM consumes CPU and fills up the logs really fast when there is no RS
to assign
                 Key: HBASE-18408
                 URL: https://issues.apache.org/jira/browse/HBASE-18408
             Project: HBase
          Issue Type: Bug
            Reporter: Enis Soztutar


I was testing something else when I discovered that when there is no RS to assign a region
to (but master is alive), then AM/LB creates GB's of logs. 

Logs like this:
{code}
2017-07-18 16:40:00,712 WARN  [AssignmentThread] balancer.BaseLoadBalancer: Wanted to do round
robin assignment but no servers to assign to
2017-07-18 16:40:00,712 WARN  [AssignmentThread] assignment.AssignmentManager: unable to round-robin
assignment
org.apache.hadoop.hbase.HBaseIOException: unable to compute plans for regions=1
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager.acceptPlan(AssignmentManager.java:1725)
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager.processAssignQueue(AssignmentManager.java:1711)
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager.access$300(AssignmentManager.java:108)
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager$2.run(AssignmentManager.java:1587)
2017-07-18 16:40:00,865 WARN  [AssignmentThread] balancer.BaseLoadBalancer: Wanted to do round
robin assignment but no servers to assign to
2017-07-18 16:40:00,866 WARN  [AssignmentThread] assignment.AssignmentManager: unable to round-robin
assignment
org.apache.hadoop.hbase.HBaseIOException: unable to compute plans for regions=1
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager.acceptPlan(AssignmentManager.java:1725)
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager.processAssignQueue(AssignmentManager.java:1711)
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager.access$300(AssignmentManager.java:108)
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager$2.run(AssignmentManager.java:1587)
2017-07-18 16:40:01,019 WARN  [AssignmentThread] balancer.BaseLoadBalancer: Wanted to do round
robin assignment but no servers to assign to
2017-07-18 16:40:01,019 WARN  [AssignmentThread] assignment.AssignmentManager: unable to round-robin
assignment
org.apache.hadoop.hbase.HBaseIOException: unable to compute plans for regions=1
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager.acceptPlan(AssignmentManager.java:1725)
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager.processAssignQueue(AssignmentManager.java:1711)
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager.access$300(AssignmentManager.java:108)
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager$2.run(AssignmentManager.java:1587)
2017-07-18 16:40:01,173 WARN  [AssignmentThread] balancer.BaseLoadBalancer: Wanted to do round
robin assignment but no servers to assign to
2017-07-18 16:40:01,173 WARN  [AssignmentThread] assignment.AssignmentManager: unable to round-robin
assignment
org.apache.hadoop.hbase.HBaseIOException: unable to compute plans for regions=1
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager.acceptPlan(AssignmentManager.java:1725)
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager.processAssignQueue(AssignmentManager.java:1711)
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager.access$300(AssignmentManager.java:108)
	at org.apache.hadoop.hbase.master.assignment.AssignmentManager$2.run(AssignmentManager.java:1587)
{code}

Reproduction is easy: 
 - Start pseudo-distributed cluster
 - Create a table 
 - kill region server 

I have also noticed that we are just spinning CPU in another case consuming 100-200% (but
this is in a very old code base from master) in this cycle: 
{code}
"ProcedureExecutor-0" #106 daemon prio=5 os_prio=0 tid=0x00007fab54851800 nid=0xcf1 runnable
[0x00007fab4e7b0000]
   java.lang.Thread.State: RUNNABLE
	at java.lang.Object.hashCode(Native Method)
	at java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1106)
	at java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1097)
	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.close(HRegion.java:6158)
	- locked <0x00000000c4cb62e8> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
	at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6829)
	at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6790)
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2125)
	at org.apache.hadoop.hbase.client.HTable$1.call(HTable.java:425)
	at org.apache.hadoop.hbase.client.HTable$1.call(HTable.java:416)
	at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:102)
	at org.apache.hadoop.hbase.client.HTable.get(HTable.java:433)
	at org.apache.hadoop.hbase.client.HTable.get(HTable.java:399)
	at org.apache.hadoop.hbase.MetaTableAccessor.getTableState(MetaTableAccessor.java:1084)
	at org.apache.hadoop.hbase.master.TableStateManager.readMetaState(TableStateManager.java:188)
	at org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:172)
	at org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:131)
	at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.processDeadRegion(ServerCrashProcedure.java:666)
	at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.calcRegionsToAssign(ServerCrashProcedure.java:460)
	at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:254)
	at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:72)
	at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:133)
	at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:523)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1061)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:855)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:808)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:75)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:495)
{code}
I think this happens when meta is not hosted in master. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message