hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13891) AM should handle RegionServerStoppedException during assignment
Date Mon, 15 Jun 2015 18:30:01 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586453#comment-14586453
] 

Enis Soztutar commented on HBASE-13891:
---------------------------------------

{{reuseDestinationHost}} is a bit confusing, since in my initial read of the patch, I though
if {{reuseDestinationHost==true}}, we will be always retrying on the same host (which is not
the case). Can we change the name to {{sameHostCanBeReused}} or smt similar to that affect.

Other than that, PLGTM. I would not put it to 1.1.1 though. A bit risky at the last minute.



> AM should handle RegionServerStoppedException during assignment
> ---------------------------------------------------------------
>
>                 Key: HBASE-13891
>                 URL: https://issues.apache.org/jira/browse/HBASE-13891
>             Project: HBase
>          Issue Type: Bug
>          Components: master, Region Assignment
>    Affects Versions: 1.1.0.1
>            Reporter: Nick Dimiduk
>         Attachments: 13891.patch
>
>
> I noticed the following in the master logs
> {noformat}
> 2015-06-11 11:04:55,278 WARN  [AM.ZK.Worker-pool2-t337] master.AssignmentManager: Failed
assignment of SYSTEM.SEQUENCE,\x8E\x00\x00\x00,1434010321127.d2be67cf43d6bd600c7f461701ca908f.
to ip-172-31-32-232.ec2.internal,16020,1434020633773, trying to assign elsewhere instead;
try=1 of 10
> org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: org.apache.hadoop.hbase.regionserver.RegionServerStoppedException:
Server ip-172-31-32-232.ec2.internal,16020,1434020633773 not running, aborting
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:980)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1382)
> 	at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22117)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> 	at java.lang.Thread.run(Thread.java:745)
> 	at sun.reflect.GeneratedConstructorAccessor26.newInstance(Unknown Source)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> 	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
> 	at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:322)
> 	at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:752)
> 	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2136)
> 	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1590)
> 	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1568)
> 	at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:106)
> 	at org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:1063)
> 	at org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1511)
> 	at org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1295)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.regionserver.RegionServerStoppedException):
org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server ip-172-31-32-232.ec2.internal,16020,1434020633773
not running, aborting
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:980)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1382)
> 	at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22117)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> 	at java.lang.Thread.run(Thread.java:745)
> 	at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1206)
> 	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
> 	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
> 	at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:23003)
> 	at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:749)
> 	... 12 more
> ...
> 2015-06-11 11:04:55,289 INFO  [AM.ZK.Worker-pool2-t337] master.AssignmentManager: Assigning
SYSTEM.SEQUENCE,\x8E\x00\x00\x00,1434010321127.d2be67cf43d6bd600c7f461701ca908f. to ip-172-31-32-232.ec2.internal,16020,1434020633773
> ...
> 2015-06-11 11:04:55,317 WARN  [AM.ZK.Worker-pool2-t337] master.AssignmentManager: Failed
assignment of SYSTEM.SEQUENCE,\x8E\x00\x00\x00,1434010321127.d2be67cf43d6bd600c7f461701ca908f.
to ip-172-31-32-232.ec2.internal,16020,1434020633773, trying to assign elsewhere instead;
try=2 of 10
> <same long stack redacted>
> ...
> 2015-06-11 11:04:55,332 INFO  [AM.ZK.Worker-pool2-t337] master.AssignmentManager: Assigning
SYSTEM.SEQUENCE,\x8E\x00\x00\x00,1434010321127.d2be67cf43d6bd600c7f461701ca908f. to ip-172-31-32-232.ec2.internal,16020,1434020633773
> {noformat}
> This is repeated over and over as the AM spams the same region to the same server. Probably
the {{RegionServerStoppedException}} should be detected and the destination of the plan be
added to the dead server list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message