hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-19868) TestCoprocessorWhitelistMasterObserver is flakey
Date Tue, 30 Jan 2018 04:58:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-19868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344504#comment-16344504
] 

stack commented on HBASE-19868:
-------------------------------

[~psomogyi] How'd you extract the exception? Going about nightlies, I was unable to get good
log. The exception seems to come after the log run at 

02:36:36,686 (Log here ends at [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/203/testReport/junit/org.apache.hadoop.hbase.security.access/TestCoprocessorWhitelistMasterObserver/org_apache_hadoop_hbase_security_access_TestCoprocessorWhitelistMasterObserver/] 
2018-01-26 02:36:36,440)

I can't make the test fail locally which I'm guessing is what you are finding.

(I was looking to see why we are not archiving the surefire test run output seemingly. It
looks like the flags are in place but i can't find the raw emissions. Need to dig more).

So, the odd thing about this test is conf.setInt("hbase.client.retries.number", 1) (I think).
If stuff is slow around startup, we'll fail our one attempt. Its happening in

ClusterSchemaServiceImpl which is cast as a Guava Service. It is being started async. Are
we stuck in 

isTableAvailableAndInitialized failing our one attempt over and over. I can't tell. Or is
it that Master fails to come up and then we are just stuck in mini cluster startup trying
to scan a meta that is not coming? Let me try some debug and up the retries to 5.... Test
should still fail fast....

 

> TestCoprocessorWhitelistMasterObserver is flakey
> ------------------------------------------------
>
>                 Key: HBASE-19868
>                 URL: https://issues.apache.org/jira/browse/HBASE-19868
>             Project: HBase
>          Issue Type: Bug
>          Components: flakey, test
>    Affects Versions: 2.0.0-beta-1
>            Reporter: Peter Somogyi
>            Priority: Major
>             Fix For: 2.0.0-beta-2
>
>
> TestCoprocessorWhitelistMasterObserver is failing 33% of the time. In the logs it looks
like the failure is related to Master initialization.
> Following log is from [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/203] 
> {noformat}
> 2018-01-26 02:36:36,686 WARN [M:0;1f0c4777c1ba:35049] master.TableNamespaceManager(307):
Caught exception in initializing namespace table manager
> org.apache.hadoop.hbase.DoNotRetryIOException: hconnection-0x18cd2ac8 closed
> at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722)
> at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
> at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714)
> at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
> at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:684)
> at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
> at org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:562)
> at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131)
> at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:73)
> at org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223)
> at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:388)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:362)
> at org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:141)
> at org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:281)
> at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:103)
> at org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62)
> at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226)
> at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1059)
> at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921)
> at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553)
> at java.lang.Thread.run(Thread.java:748)
> 2018-01-26 02:36:36,691 ERROR [M:0;1f0c4777c1ba:35049] helpers.MarkerIgnoringBase(159):
Failed to become active master
> java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl [FAILED]
to be RUNNING, but the service has FAILED
> at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
> at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
> at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1061)
> at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921)
> at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: hconnection-0x18cd2ac8 closed
> at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722)
> at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
> at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714)
> at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
> at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:684)
> at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
> at org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:562)
> at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131)
> at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:73)
> at org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223)
> at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:388)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:362)
> at org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:141)
> at org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:281)
> at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:103)
> at org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62)
> at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226)
> at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1059)
> ... 4 more
> 2018-01-26 02:36:36,691 ERROR [M:0;1f0c4777c1ba:35049] helpers.MarkerIgnoringBase(143):
Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.security.access.TestCoprocessorWhitelistMasterObserver$TestRegionObserver,
org.apache.hadoop.hbase.security.access.CoprocessorWhitelistMasterObserver, org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
> 2018-01-26 02:36:36,691 ERROR [M:0;1f0c4777c1ba:35049] helpers.MarkerIgnoringBase(159):
***** ABORTING master 1f0c4777c1ba,35049,1516934184742: Unhandled exception. Starting shutdown.
*****
> java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl [FAILED]
to be RUNNING, but the service has FAILED
> at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
> at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
> at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1061)
> at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921)
> at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: hconnection-0x18cd2ac8 closed
> at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722)
> at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
> at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714)
> at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
> at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:684)
> at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
> at org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:562)
> at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131)
> at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:73)
> at org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223)
> at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:388)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:362)
> at org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:141)
> at org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:281)
> at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:103)
> at org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62)
> at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226)
> at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1059)
> ... 4 more
> 2018-01-26 02:36:36,692 DEBUG [M:0;1f0c4777c1ba:35049] coprocessor.CoprocessorHost(289):
Stop coprocessor org.apache.hadoop.hbase.security.access.CoprocessorWhitelistMasterObserver
> 2018-01-26 02:36:36,692 WARN [M:0;1f0c4777c1ba:35049] coprocessor.BaseEnvironment(99):
Not stopping coprocessor org.apache.hadoop.hbase.security.access.CoprocessorWhitelistMasterObserver
because not active (state=STOPPED)
> 2018-01-26 02:36:36,692 INFO [M:0;1f0c4777c1ba:35049] regionserver.HRegionServer(2142):
***** STOPPING region server '1f0c4777c1ba,35049,1516934184742' *****{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message