hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5833) 0.92 build has been failing pretty consistently on TestMasterFailover....
Date Sun, 22 Apr 2012 05:10:47 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258998#comment-13258998

stack commented on HBASE-5833:

More digging.  The newest test added here, testShouldCheckMasterFailOverWhenMETAIsInOpenedState,
is a little interesting.  It was added by this commit:

r1172063 | tedyu | 2011-09-17 13:27:00 -0700 (Sat, 17 Sep 2011) | 3 lines

HBASE-4400  .META. getting stuck if RS hosting it is dead and znode state is in
               RS_ZK_REGION_OPENED (Ramkrishna)


The test is a bunch of copy/paste confirming stuff its not using.  It then does a cluster
shutdown but does it explicitly on a cluster object and not via HBaseTestingUtility though
it then starts a cluster subsequently with HBaseTestingUtility.  Not using HTU to do both
the shutodwn and the startup can make he HTU state confused on whether there a master available
so we just wait for ever.  This seems to be responsible for case where test would timeout
after 15 minutes and say no tests run and none failed.

I added a timeout for this test of 3 minutes.

Other interesting stuff is that this TestMasterFailover starts clusters per method but shutdown
leaves around some threads.  I dug in some and was able to clean up an LruBlockCache eviction
thread but others persist and would take a little more work to undo.  They seem harmless but
I'll list them anyways:

TestMasterFailover [JUnit]	
	org.eclipse.jdt.internal.junit.runner.RemoteTestRunner at localhost:54811	
		Thread [main] (Running)	
		Thread [ReaderThread] (Running)	
		Thread [Thread-2] (Suspended (breakpoint at line 587 in HBaseTestingUtility))	
			HBaseTestingUtility.shutdownMiniCluster() line: 587	
			TestMasterFailover.testSimpleMasterFailover() line: 178	
			NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native
			NativeMethodAccessorImpl.invoke(Object, Object[]) line: 39	
			DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 25	
			Method.invoke(Object, Object...) line: 597	
			FrameworkMethod$1.runReflectiveCall() line: 45	
			FrameworkMethod$1(ReflectiveCallable).run() line: 15	
			FrameworkMethod.invokeExplosively(Object, Object...) line: 42	
			InvokeMethod.evaluate() line: 20	
			FailOnTimeout$StatementThread.run() line: 62	
		Daemon Thread [Poller SunPKCS11-Darwin] (Running)	
		Thread [pool-1-thread-1] (Running)	
		Thread [pool-2-thread-1] (Running)	
		Thread [pool-3-thread-1] (Running)	
		Thread [pool-4-thread-1] (Running)	
		Daemon Thread [LeaseChecker] (Running)	
		Daemon Thread [RegionServer:2;,54842,1335066804457.decayingSampleTick.1] (Running)

		Daemon Thread [Master:2;,54838,1335066803952-SendThread(fe80:0:0:0:0:0:0:1%1:21818)]
		Daemon Thread [Master:2;,54838,1335066803952-EventThread] (Running)	
		Daemon Thread [Master:1;,54836,1335066798880-EventThread] (Running)	
		Daemon Thread [Master:1;,54836,1335066798880-SendThread(localhost:21818)] (Running)

	/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java (Apr 21, 2012 8:53:07

The thread names are enhanced -- v2 of this patch -- but things like decayingSampleTick are
set in a static so hard to get rid of in test setup.  The SendThread/EventThread are zk client
hangouts.  Not sure what pool-4-thread-1 are (I've enhanced the HTable executor to include
htable in name so these are identifiable going forward but above executor does not seem to
be HTable).
> 0.92 build has been failing pretty consistently on TestMasterFailover....
> -------------------------------------------------------------------------
>                 Key: HBASE-5833
>                 URL: https://issues.apache.org/jira/browse/HBASE-5833
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.92.2
>         Attachments: 5833.txt, closehregions.txt
> Trunk seems fine but 0.92 fails on this test pretty regularly.  Running it local it seems
to hang for me.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message