Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 842311001B for ; Thu, 18 Jul 2013 05:32:55 +0000 (UTC) Received: (qmail 36431 invoked by uid 500); 18 Jul 2013 05:32:51 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 36386 invoked by uid 500); 18 Jul 2013 05:32:50 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 36360 invoked by uid 99); 18 Jul 2013 05:32:49 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jul 2013 05:32:49 +0000 Date: Thu, 18 Jul 2013 05:32:48 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-8939) Hanging unit tests MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-8939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712027#comment-13712027 ] stack commented on HBASE-8939: ------------------------------ I added to apache builds a post build task that runs our zombie tracker from ./dev-tools/test-patch.sh. It caught one just now: https://builds.apache.org/job/HBase-TRUNK/4265/console TestLogRollAbort won't shutdown. It is a bit of a strange test in that it kills hdfs out from under us and tries to ensure we don't lose edits. We are stuck on a thread join. It looks like it has a timer of two minutes but oddly the test claims to have 'passed' early enough in the game: Running org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 147.206 sec Here is where we are bound up. {code}"pool-1-thread-1" prio=10 tid=0x7614b400 nid=0x62da in Object.wait() [0x7774f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x7fd2d268> (a org.apache.hadoop.hbase.util.JVMClusterUtil$RegionServerThread) at java.lang.Thread.join(Thread.java:1186) - locked <0x7fd2d268> (a org.apache.hadoop.hbase.util.JVMClusterUtil$RegionServerThread) at java.lang.Thread.join(Thread.java:1239) at org.apache.hadoop.hbase.util.JVMClusterUtil.shutdown(JVMClusterUtil.java:242) at org.apache.hadoop.hbase.LocalHBaseCluster.shutdown(LocalHBaseCluster.java:427) at org.apache.hadoop.hbase.MiniHBaseCluster.shutdown(MiniHBaseCluster.java:495) at org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:742) at org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:711) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort.tearDown(TestLogRollAbort.java:114) {code} But we are also stuck here in setup: {code} "LeaseChecker@DFSClient[clientName=DFSClient_1663452662, ugi=jenkins]: java.lang.Throwable: for testing at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.toString(DFSClient.java:1393) at org.apache.hadoop.util.Daemon.(Daemon.java:38) at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.put(DFSClient.java:1306) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:716) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:182) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:555) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:536) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:443) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:435) at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:476) at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:361) at org.apache.hadoop.hbase.HBaseTestingUtility.createRootDir(HBaseTestingUtility.java:773) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:645) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:627) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:575) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:562) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort.setUp(TestLogRollAbort.java:102) {code} We are doing setup and shutdown when thread dumped. I'm going to disable this test for now so we get clean builds. > Hanging unit tests > ------------------ > > Key: HBASE-8939 > URL: https://issues.apache.org/jira/browse/HBASE-8939 > Project: HBase > Issue Type: Bug > Components: test > Reporter: stack > Fix For: 0.95.2 > > Attachments: 8939.txt > > > We have hanging tests. Here's a few from this morning's review: > {code} > durruti:0.95 stack$ ./dev-support/findHangingTest.sh https://builds.apache.org/job/hbase-0.95-on-hadoop2/176/consoleText > % Total % Received % Xferd Average Speed Time Time Time Current > Dload Upload Total Spent Left Speed > 100 3300k 0 3300k 0 0 508k 0 --:--:-- 0:00:06 --:--:-- 621k > Hanging test: Running org.apache.hadoop.hbase.TestIOFencing > Hanging test: Running org.apache.hadoop.hbase.regionserver.wal.TestLogRolling > {code} > And... > {code} > durruti:0.95 stack$ ./dev-support/findHangingTest.sh http://54.241.6.143/job/HBase-TRUNK-Hadoop-2/396/consoleText > % Total % Received % Xferd Average Speed Time Time Time Current > Dload Upload Total Spent Left Speed > 100 779k 0 779k 0 0 538k 0 --:--:-- 0:00:01 --:--:-- 559k > Hanging test: Running org.apache.hadoop.hbase.TestIOFencing > Hanging test: Running org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort > Hanging test: Running org.apache.hadoop.hbase.client.TestFromClientSide3 > {code} > and.... > {code} > durruti:0.95 stack$ ./dev-support/findHangingTest.sh http://54.241.6.143/job/HBase-0.95/607/consoleText > % Total % Received % Xferd Average Speed Time Time Time Current > Dload Upload Total Spent Left Speed > 100 445k 0 445k 0 0 490k 0 --:--:-- --:--:-- --:--:-- 522k > Hanging test: Running org.apache.hadoop.hbase.replication.TestReplicationDisableInactivePeer > Hanging test: Running org.apache.hadoop.hbase.master.TestAssignmentManager > Hanging test: Running org.apache.hadoop.hbase.util.TestHBaseFsck > Hanging test: Running org.apache.hadoop.hbase.regionserver.TestStoreFileBlockCacheSummary > Hanging test: Running org.apache.hadoop.hbase.IntegrationTestDataIngestSlowDeterministic > {code} > and... > {code} > durruti:0.95 stack$ ./dev-support/findHangingTest.sh http://54.241.6.143/job/HBase-0.95-Hadoop-2/607/consoleText > % Total % Received % Xferd Average Speed Time Time Time Current > Dload Upload Total Spent Left Speed > 100 781k 0 781k 0 0 240k 0 --:--:-- 0:00:03 --:--:-- 244k > Hanging test: Running org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint > Hanging test: Running org.apache.hadoop.hbase.client.TestFromClientSide > Hanging test: Running org.apache.hadoop.hbase.TestIOFencing > Hanging test: Running org.apache.hadoop.hbase.master.TestMasterFailoverBalancerPersistence > Hanging test: Running org.apache.hadoop.hbase.master.TestDistributedLogSplitting > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira