Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A5B6910267 for ; Thu, 31 Oct 2013 22:56:19 +0000 (UTC) Received: (qmail 78978 invoked by uid 500); 31 Oct 2013 22:56:19 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 78945 invoked by uid 500); 31 Oct 2013 22:56:19 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 78936 invoked by uid 99); 31 Oct 2013 22:56:19 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Oct 2013 22:56:19 +0000 Date: Thu, 31 Oct 2013 22:56:17 +0000 (UTC) From: "Ted Yu (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-9863) Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810833#comment-13810833 ] Ted Yu commented on HBASE-9863: ------------------------------- object monitor was held by the following call: {code} "FifoRpcScheduler.handler1-thread-4" daemon prio=10 tid=0x09479800 nid=0x2b48 waiting on condition [0x6f157000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:148) - locked <0xd399b5a8> (a org.apache.hadoop.hbase.client.RpcRetryingCaller) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:753) at org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:134) at org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:118) - locked <0x7f7828f0> (a org.apache.hadoop.hbase.master.TableNamespaceManager) {code} Many other threads were blocked on 0x7f7828f0 : {code} "FifoRpcScheduler.handler1-thread-5" daemon prio=10 tid=0x71ba6400 nid=0x35a7 waiting for monitor entry [0x6ef20000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:250) - waiting to lock <0x7f7828f0> (a org.apache.hadoop.hbase.master.TableNamespaceManager) at org.apache.hadoop.hbase.master.HMaster.isTableNamespaceManagerReady(HMaster.java:3146) at org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3105) {code} > Intermittently TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry hangs > --------------------------------------------------------------------------------------- > > Key: HBASE-9863 > URL: https://issues.apache.org/jira/browse/HBASE-9863 > Project: HBase > Issue Type: Bug > Reporter: Ted Yu > > TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry sometimes hung. > Here were two recent occurrences: > https://builds.apache.org/job/PreCommit-HBASE-Build/7676/console > https://builds.apache.org/job/PreCommit-HBASE-Build/7671/console > There were 9 occurrences of the following in both stack traces: > {code} > "FifoRpcScheduler.handler1-thread-5" daemon prio=10 tid=0x09df8800 nid=0xc17 waiting for monitor entry [0x6fdf8000] > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:250) > - waiting to lock <0x7f69b5f0> (a org.apache.hadoop.hbase.master.TableNamespaceManager) > at org.apache.hadoop.hbase.master.HMaster.isTableNamespaceManagerReady(HMaster.java:3146) > at org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3105) > at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1743) > at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1782) > at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38221) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:1983) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) > {code} > The test hung here: > {code} > "pool-1-thread-1" prio=10 tid=0x74f7b800 nid=0x5aa5 in Object.wait() [0x74efe000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0xcc848348> (a org.apache.hadoop.hbase.ipc.RpcClient$Call) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1436) > - locked <0xcc848348> (a org.apache.hadoop.hbase.ipc.RpcClient$Call) > at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654) > at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712) > at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.createTable(MasterProtos.java:40372) > at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.createTable(HConnectionManager.java:1931) > at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:598) > at org.apache.hadoop.hbase.client.HBaseAdmin$2.call(HBaseAdmin.java:594) > at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116) > - locked <0x7faa26d0> (a org.apache.hadoop.hbase.client.RpcRetryingCaller) > at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94) > - locked <0x7faa26d0> (a org.apache.hadoop.hbase.client.RpcRetryingCaller) > at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3124) > at org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsync(HBaseAdmin.java:594) > at org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:485) > at org.apache.hadoop.hbase.TestZooKeeper.testRegionAssignmentAfterMasterRecoveryDueToZKExpiry(TestZooKeeper.java:486) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)