hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy Xiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12380) Too many attempts to open a region can crash the RegionServer
Date Thu, 30 Oct 2014 16:52:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190360#comment-14190360
] 

Jimmy Xiang commented on HBASE-12380:
-------------------------------------

I have discussed it with Esteban. We agree that it is better not to abort. We can log a warning/error
message instead and let it go.

The reason for aborting is that this scenario should never happen natually. Master has a state
machine and won't send the open call again if it is already opened.
My concern with not aborting is that we may hide some serious bug in master if that indeed
happens.

This test is an old test. My suggestion is to remove this test.

> Too many attempts to open a region can crash the RegionServer
> -------------------------------------------------------------
>
>                 Key: HBASE-12380
>                 URL: https://issues.apache.org/jira/browse/HBASE-12380
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Esteban Gutierrez
>            Priority: Critical
>
> Noticed this while trying to fix faulty test while working on a fix for HBASE-12219:
> {code}
> Tests in error:
>   TestRegionServerNoMaster.testMultipleOpen:237 » Service java.io.IOException: R...
>   TestRegionServerNoMaster.testCloseByRegionServer:211->closeRegionNoZK:201 » Service
> {code}
> Initially I thought the problem was on my patch for HBASE-12219 but I noticed that the
issue was occurring on the 7th attempt to open the region. However I was able to reproduce
the same problem in the master branch after increasing the number of requests in testMultipleOpen():
> {code}
> 2014-10-29 15:03:45,043 INFO  [Thread-216] regionserver.RSRpcServices(1334): Receiving
OPEN for the region:TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.,
which we are already trying to OPEN - ignoring this new request for this region.
> Submitting openRegion attempt: 16 <====
> 2014-10-29 15:03:45,044 INFO  [Thread-216] regionserver.RSRpcServices(1311): Open TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.
> 2014-10-29 15:03:45,044 INFO  [PostOpenDeployTasks:025198143197ea68803e49819eae27ca]
hbase.MetaTableAccessor(1307): Updated row TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.
with server=192.168.1.105,63082,1414620220789
> Submitting openRegion attempt: 17 <====
> 2014-10-29 15:03:45,046 ERROR [RS_OPEN_REGION-192.168.1.105:63082-2] handler.OpenRegionHandler(88):
Region 025198143197ea68803e49819eae27ca was already online when we started processing the
opening. Marking this new attempt as failed
> 2014-10-29 15:03:45,047 FATAL [Thread-216] regionserver.HRegionServer(1931): ABORTING
region server 192.168.1.105,63082,1414620220789: Received OPEN for the region:TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.,
which is already online
> 2014-10-29 15:03:45,047 FATAL [Thread-216] regionserver.HRegionServer(1937): RegionServer
abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
> 2014-10-29 15:03:45,054 WARN  [Thread-216] regionserver.HRegionServer(1955): Unable to
report fatal error to master
> com.google.protobuf.ServiceException: java.io.IOException: Call to /192.168.1.105:63079
failed on local exception: java.io.IOException: Connection to /192.168.1.105:63079 is closing.
Call id=4, waitTime=2
>         at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1707)
>         at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1757)
>         at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.reportRSFatalError(RegionServerStatusProtos.java:8301)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1952)
>         at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abortRegionServer(MiniHBaseCluster.java:174)
>         at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$100(MiniHBaseCluster.java:108)
>         at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$2.run(MiniHBaseCluster.java:167)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:356)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
>         at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:277)
>         at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abort(MiniHBaseCluster.java:165)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1964)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1308)
>         at org.apache.hadoop.hbase.regionserver.TestRegionServerNoMaster.testMultipleOpen(TestRegionServerNoMaster.java:237)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>         at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>         at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>         at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>         at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.io.IOException: Call to /192.168.1.105:63079 failed on local exception:
java.io.IOException: Connection to /192.168.1.105:63079 is closing. Call id=4, waitTime=2
>         at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1563)
>         at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1534)
>         at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1692)
>         ... 23 more
> Caused by: java.io.IOException: Connection to /192.168.1.105:63079 is closing. Call id=4,
waitTime=2
>         at org.apache.hadoop.hbase.ipc.RpcClient$Connection.cleanupCalls(RpcClient.java:1257)
>         at org.apache.hadoop.hbase.ipc.RpcClient$Connection.close(RpcClient.java:1063)
>         at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:791)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message