hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Dimiduk (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-18613) Race condition between master restart and test code when restoring distributed cluster after integration test
Date Fri, 01 Dec 2017 04:35:00 GMT

     [ https://issues.apache.org/jira/browse/HBASE-18613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nick Dimiduk updated HBASE-18613:
---------------------------------
    Fix Version/s:     (was: 1.1.13)
                       (was: 1.2.7)
                       (was: 1.3.2)
                       (was: 1.4.0)
                       (was: 2.0.0)

Removing fixVersions from ticked closed as invalid.

> Race condition between master restart and test code when restoring distributed cluster
after integration test
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-18613
>                 URL: https://issues.apache.org/jira/browse/HBASE-18613
>             Project: HBase
>          Issue Type: Bug
>          Components: integration tests
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Minor
>
> Noticed the following in some internal testing (line numbers likely are skewed)
> {noformat}
> 2017-08-16 21:20:25,557| 2017-08-16 21:20:25,553 WARN  [main] client.ConnectionManager$HConnectionImplementation:
Checking master connection
> 2017-08-16 21:20:25,557| com.google.protobuf.ServiceException: org.apache.hadoop.hbase.exceptions.ConnectionClosingException:
Call to master1.domain.com/10.0.2.131:16000 failed on local exception: org.apache.hadoop.hbase.exceptions.ConnectionClosingException:
Connection to master1.domain.com/10.0.2.131:16000 is closing. Call id=581, waitTime=1
> 2017-08-16 21:20:25,557| at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:223)
> 2017-08-16 21:20:25,558| at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
> 2017-08-16 21:20:25,560| at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:62739)
> 2017-08-16 21:20:25,560| at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(ConnectionManager.java:1448)
> 2017-08-16 21:20:25,561| at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(ConnectionManag
> er.java:2124)
> 2017-08-16 21:20:25,561| at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1712)
> 2017-08-16 21:20:25,562| at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getMaster(ConnectionManager.java:1701)
> 2017-08-16 21:20:25,562| at org.apache.hadoop.hbase.DistributedHBaseCluster.getMasterAdminService(DistributedHBaseCluster.java:153)
> 2017-08-16 21:20:25,563| at org.apache.hadoop.hbase.DistributedHBaseCluster.waitForActiveAndReadyMaster(DistributedHBaseCluster.java:184)
> 2017-08-16 21:20:25,563| at org.apache.hadoop.hbase.HBaseCluster.waitForActiveAndReadyMaster(HBaseCluster.java:204)
> 2017-08-16 21:20:25,563| at org.apache.hadoop.hbase.DistributedHBaseCluster.restoreMasters(DistributedHBaseCluster.java:278)
> 2017-08-16 21:20:25,563| at org.apache.hadoop.hbase.DistributedHBaseCluster.restoreClusterStatus(DistributedHBaseCluster.java:239)
> 2017-08-16 21:20:25,563| at org.apache.hadoop.hbase.HBaseCluster.restoreInitialStatus(HBaseCluster.java:235)
> 2017-08-16 21:20:25,564| at org.apache.hadoop.hbase.IntegrationTestingUtility.restoreCluster(IntegrationTestingUtility.java:99)
> 2017-08-16 21:20:25,564| at org.apache.hadoop.hbase.IntegrationTestBase.cleanUpCluster(IntegrationTestBase.java:200)
> 2017-08-16 21:20:25,564| at org.apache.hadoop.hbase.IntegrationTestDDLMasterFailover.cleanUpCluster(IntegrationTestDDLMasterFailover.java:146)
> 2017-08-16 21:20:25,564| at org.apache.hadoop.hbase.IntegrationTestBase.cleanUp(IntegrationTestBase.java:140)
> 2017-08-16 21:20:25,564| at org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:125)
> 2017-08-16 21:20:25,565| at org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
> 2017-08-16 21:20:25,565| at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> 2017-08-16 21:20:25,565| at org.apache.hadoop.hbase.IntegrationTestDDLMasterFailover.main(IntegrationTestDDLMasterFailover.java:832)
> 2017-08-16 21:20:25,566| Caused by: org.apache.hadoop.hbase.exceptions.ConnectionClosingException:
Call to master1.domain.com/10.0.2.131:16000 failed on local exception: org.apache.hadoop.hbase.exceptions.ConnectionClosingException:
Connection to master1.domain.com/10.0.2.131:16000 is closing. Call id=581, waitTime=1
> 2017-08-16 21:20:25,566| at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1258)
> 2017-08-16 21:20:25,566| at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1229)
> 2017-08-16 21:20:25,566| at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
> 2017-08-16 21:20:25,566| ... 20 more
> 2017-08-16 21:20:25,566| Caused by: org.apache.hadoop.hbase.exceptions.ConnectionClosingException:
Connection to master1.domain.com/10.0.2.131:16000 is closing. Call id=581, waitTime=1
> 2017-08-16 21:20:25,567| at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.cleanupCalls(RpcClientImpl.java:1047)
> 2017-08-16 21:20:25,567| at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:846)
> 2017-08-16 21:20:25,567| at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.run(RpcClientImpl.java:574)
> {noformat}
> This is when the IntegrationTest harness is resetting the state of the distributed cluster.
When dealing with "slow" nodes, the restart of the previously active master could be delayed
which cause the test code to see a ConnectionClosingException (wrapped in a ServiceException).
> I think we want to just consume this Exception, same as MasterNotRunningException and
ZooKeeperConnectionException, in {{DistributedHBaseCluster#waitForActiveAndReadyMaster(long)}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message