hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: TestVisibilityLabelsWithACL is flakey, fails frequently
Date Fri, 04 Dec 2015 19:00:12 GMT
I see. Any chance of similar hack in this test? Or disabling this test in
all but master branch? Or a generic version of your hack (probably not)?

Getting a successful test run requires our going through all unit tests
twice, first on jdk7 and then on jdk8. The probability for fail is high
(smile) or at least, for flakies to raise their heads. Its a pity after
running thousands of unit tests, that all fail because of a single missed
watcher.

You think the test written wrong then Andrew? It should be done more
defensively prepared to miss a watcher? If the latter, I could disable it
until this had been addressed?

Thanks for the back and forth,
St.Ack

On Fri, Dec 4, 2015 at 10:04 AM, Andrew Purtell <apurtell@apache.org> wrote:

> Would be a pity to disable the test. On the other hand we seem to flake
> wherever using watcher triggers in miniclusters to move state forward.
> That's fixed by porting the notification to ProcV2. Otherwise, we hack
> around the edges (like HBASE-14209).
>
> On Fri, Dec 4, 2015 at 9:47 AM, Stack <stack@duboce.net> wrote:
>
> > It shuts down fine. It just fails too often in scheme of things. I could
> > just disable it.
> > St.Ack
> >
> > On Fri, Dec 4, 2015 at 9:42 AM, Andrew Purtell <apurtell@apache.org>
> > wrote:
> >
> > > > Snapshot of AccessController state does not include instance on
> region
> > >
> > > We update a znode and wait for a state change driven by processing a
> > watch
> > > notification for the znode change. The watch notification is apparently
> > > lost. Yeah, once that happens the test is dead. It shouldn't hang
> > > indefinitely, the predicate should only wait for 10 seconds, then error
> > > out. If that isn't happening we've got some kind of test shutdown hang
> > bug.
> > >
> > >
> > >
> > > On Fri, Dec 4, 2015 at 9:29 AM, Stack <stack@duboce.net> wrote:
> > >
> > > > Anyone up for taking a look at this flakey test?
> > > >
> > > > See here for example:
> > > >
> > > >
> > >
> >
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-1.2/419/jdk=latest1.7,label=Hadoop/testReport/junit/org.apache.hadoop.hbase.security.visibility/TestVisibilityLabelsWithACL/org_apache_hadoop_hbase_security_visibility_TestVisibilityLabelsWithACL/
> > > >
> > > > I see it fail from time to time.
> > > >
> > > > Something is odd. Says we time out on setup after ten seconds.
> Digging
> > in
> > > > more, I see this around startup:
> > > >
> > > >
> > > > 2015-12-02 23:08:42,790 DEBUG
> > > > [B.defaultRpcServer.handler=1,queue=0,port=47849]
> ipc.CallRunner(112):
> > > > B.defaultRpcServer.handler=1,queue=0,port=47849: callId: 0 service:
> > > > RegionServerStatusService methodName: RegionServerStartup size: 45
> > > > connection: 67.195.81.153:43968
> > > > org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is
> > > > not running yet
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.checkServiceStarted(HMaster.java:2265)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:351)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
> > > >         at
> > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2168)
> > > >         at
> > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> > > >         at org.apache.
> > > > ...[truncated 182514 chars]...
> > > > ecureTestUtil$1(333): Snapshot of AccessController state does not
> > > > include instance on region
> > > > hbase:acl,,1449097729021.ec6be7579802c2fa1182dc62f5fb6137.
> > > > 2015-12-02 23:09:00,167 ERROR [main] access.SecureTestUtil$1(333):
> > > > Snapshot of AccessController state does not include instance on
> region
> > > > hbase:acl,,1449097729021.ec6be7579802c2fa1182dc62f5fb6137.
> > > > 2015-12-02 23:09:00,275 ERROR [main] access.SecureTestUtil$1(333):
> > > > Snapshot of AccessController state does not include instance on
> region
> > > > hbase:acl,,1449097729021.ec6be7579802c2fa1182dc62f5fb6137.
> > > >
> > > >
> > > > ....
> > > >
> > > >
> > > >
> > > >
> > > > We seem to just hang.
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > St.Ack
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message