hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18244) org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails
Date Wed, 21 Jun 2017 16:29:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057795#comment-16057795
] 

Josh Elser commented on HBASE-18244:
------------------------------------

Ok! Let me get a patch up to temporarily disable this test.

> org.apache.hadoop.hbase.client.rsgroup.TestShellRSGroups hangs/fails
> --------------------------------------------------------------------
>
>                 Key: HBASE-18244
>                 URL: https://issues.apache.org/jira/browse/HBASE-18244
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>            Reporter: Josh Elser
>            Assignee: Stephen Yuan Jiang
>             Fix For: 2.0.0
>
>
> Sometime in the past couple of weeks, TestShellRSGroups has started timing-out/failing
for me.
> It will get stuck on a call to moveTables()
> {noformat}
> "main" #1 prio=5 os_prio=31 tid=0x00007ff012004800 nid=0x1703 in Object.wait() [0x000070000020d000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:502)
>         at org.apache.hadoop.hbase.ipc.BlockingRpcCallback.get(BlockingRpcCallback.java:62)
>         - locked <0x000000078d1003f0> (a org.apache.hadoop.hbase.ipc.BlockingRpcCallback)
>         at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:328)
>         at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:94)
>         at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:567)
>         at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$BlockingStub.execMasterService(MasterProtos.java)
>         at org.apache.hadoop.hbase.client.ConnectionImplementation$3.execMasterService(ConnectionImplementation.java:1500)
>         at org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2991)
>         at org.apache.hadoop.hbase.client.HBaseAdmin$67$1.rpcCall(HBaseAdmin.java:2986)
>         at org.apache.hadoop.hbase.client.MasterCallable.call(MasterCallable.java:98)
>         at org.apache.hadoop.hbase.client.HBaseAdmin$67.callExecService(HBaseAdmin.java:2997)
>         at org.apache.hadoop.hbase.client.SyncCoprocessorRpcChannel.callBlockingMethod(SyncCoprocessorRpcChannel.java:69)
>         at org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService$BlockingStub.moveTables(RSGroupAdminProtos.java:13171)
>         at org.apache.hadoop.hbase.rsgroup.RSGroupAdminClient.moveTables(RSGroupAdminClient.java:117)
> {noformat}
> The server-side end of the RPC is waiting on a procedure to finish:
> {noformat}
> "RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242" #289 daemon prio=5 os_prio=31
tid=0x00007ff015b7c000 nid=0x1e603 waiting on condition [0x000070000dbc9000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:184)
>         at org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:171)
>         at org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToComplete(ProcedureSyncWait.java:141)
>         at org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToCompleteIOE(ProcedureSyncWait.java:130)
>         at org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.submitAndWaitProcedure(ProcedureSyncWait.java:123)
>         at org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:478)
>         at org.apache.hadoop.hbase.master.assignment.AssignmentManager.unassign(AssignmentManager.java:465)
>         at org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:432)
>         at org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl.moveTables(RSGroupAdminEndpoint.java:174)
>         at org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService.callMethod(RSGroupAdminProtos.java:12786)
>         at org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:673)
>         at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:278)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:258)
>    Locked ownable synchronizers:
>         - None
> {noformat}
> I don't see anything else running in the thread dump, but I do see that meta was closed.
Eventually the client (I think) gives up and retries, but then fails because meta isn't assigned.
> {noformat}
> 2017-06-20 15:07:45,720 DEBUG [RS_CLOSE_META-hw10447:64242-0] assignment.RegionTransitionProcedure(200):
Received report CLOSED seqId=-1, pid=12, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure
table=hbase:meta, region=1588230740, server=hw10447.local,64242,1497985650605; rit=CLOSING,
location=hw10447.local,64242,1497985650605
> 2017-06-20 15:07:45,720 DEBUG [RS_CLOSE_META-hw10447:64242-0] handler.CloseRegionHandler(122):
Closed hbase:meta,,1.1588230740
> 2017-06-20 15:07:45,722 WARN  [ProcExecWrkr-9] zookeeper.MetaTableLocator(443): Tried
to set null ServerName in hbase:meta; skipping -- ServerName required
> 2017-06-20 15:07:45,724 INFO  [ProcExecWrkr-9] procedure2.ProcedureExecutor(1167): Finished
pid=12, state=SUCCESS; UnassignProcedure table=hbase:meta, region=1588230740, server=hw10447.local,64242,1497985650605
in 379msec
> 2017-06-20 15:07:46,454 DEBUG [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242]
procedure.ProcedureSyncWait(192): waitFor pid=12
> 2017-06-20 15:07:47,458 INFO  [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242]
rsgroup.RSGroupAdminServer(345): Unassigning 0 region(s) from hw10447.local:64242 for server
move to test_group
> 2017-06-20 15:07:48,462 INFO  [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242]
rsgroup.RSGroupAdminServer(378): Move server done: default=>test_group
> Took 4.2749 seconds2017-06-20 15:07:48,488 DEBUG [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242]
rsgroup.RSGroupInfoManagerImpl(480): Updating znode: /hbase/rsgroup/test_group
> 2017-06-20 15:07:48,488 DEBUG [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242]
rsgroup.RSGroupInfoManagerImpl(480): Updating znode: /hbase/rsgroup/default
> 2017-06-20 15:07:48,488 DEBUG [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242]
rsgroup.RSGroupInfoManagerImpl(486): Writing ZK GroupInfo count: 4
> 2017-06-20 15:07:48,598 DEBUG [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242]
procedure2.ProcedureExecutor(792): Stored pid=13, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure,
tableName=test_table, type=EXCLUSIVE
> 2017-06-20 15:07:48,600 DEBUG [ProcExecWrkr-10] locking.LockProcedure(313): LOCKED pid=13,
state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, tableName=test_table,
type=EXCLUSIVE
> 2017-06-20 15:07:48,600 INFO  [ProcExecWrkr-10] procedure2.ProcedureExecutor$TimeoutExecutorThread(1743):
ADDED pid=13, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure,
tableName=test_table, type=EXCLUSIVE; timeout=600000, timestamp=1497986268600
> 2017-06-20 15:07:48,706 DEBUG [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242]
procedure2.ProcedureExecutor(792): Stored pid=14, state=RUNNABLE:REGION_TRANSITION_DISPATCH;
UnassignProcedure table=test_table, region=578de877b09db11dcc129a6ff5b22406, server=hw10447.local,64263,1497985652273
> 2017-06-20 15:07:49,179 WARN  [HBase-Metrics2-1] impl.MetricsConfig(125): Cannot locate
configuration: tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
> 2017-06-20 15:07:49,708 DEBUG [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=64242]
procedure.ProcedureSyncWait(192): waitFor pid=14
> 2017-06-20 15:08:02,376 DEBUG [ProcExecTimeout] procedure2.ProcedureExecutor$CompletedProcedureCleaner(190):
Evict completed AssignProcedure table=hbase:meta, region=1588230740 pid=1, owner=jelser.hfs.3,
state=SUCCESS, submittedTime=29sec ago, lastUpdate=29sec ago
> 2017-06-20 15:08:02,376 DEBUG [ProcExecTimeout] procedure2.ProcedureExecutor$CompletedProcedureCleaner(190):
Evict completed CreateTableProcedure table=hbase:namespace pid=2, owner=jelser.hfs.3, state=SUCCESS,
submittedTime=27sec ago, lastUpdate=27sec ago
> 2017-06-20 15:08:02,376 DEBUG [ProcExecTimeout] procedure2.ProcedureExecutor$CompletedProcedureCleaner(190):
Evict completed CreateNamespaceProcedure, namespace=default pid=4, owner=jelser.hfs.3, state=SUCCESS,
submittedTime=27sec ago, lastUpdate=27sec ago
> 2017-06-20 15:08:02,376 DEBUG [ProcExecTimeout] procedure2.ProcedureExecutor$CompletedProcedureCleaner(190):
Evict completed CreateTableProcedure table=hbase:rsgroup pid=5, owner=jelser.hfs.3, state=SUCCESS,
submittedTime=27sec ago, lastUpdate=27sec ago
> 2017-06-20 15:08:02,376 DEBUG [ProcExecTimeout] procedure2.ProcedureExecutor$CompletedProcedureCleaner(190):
Evict completed CreateNamespaceProcedure, namespace=hbase pid=7, owner=jelser.hfs.3, state=SUCCESS,
submittedTime=26sec ago, lastUpdate=26sec ago
> 2017-06-20 15:08:02,376 DEBUG [ProcExecTimeout] procedure2.ProcedureExecutor$CompletedProcedureCleaner(190):
Evict completed UnassignProcedure table=hbase:namespace, region=a47624f5345a80b72f278f79a432955b,
server=hw10447.local,64242,1497985650605 pid=11, owner=jelser, state=SUCCESS, submittedTime=18sec
ago, lastUpdate=18sec ago
> 2017-06-20 15:08:02,376 DEBUG [ProcExecTimeout] procedure2.ProcedureExecutor$CompletedProcedureCleaner(190):
Evict completed UnassignProcedure table=hbase:meta, region=1588230740, server=hw10447.local,64242,1497985650605
pid=12, owner=jelser, state=SUCCESS, submittedTime=17sec ago, lastUpdate=17sec ago
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message