Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 4052 invoked from network); 11 Mar 2011 00:55:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Mar 2011 00:55:24 -0000 Received: (qmail 85153 invoked by uid 500); 11 Mar 2011 00:55:24 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 85122 invoked by uid 500); 11 Mar 2011 00:55:24 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 85114 invoked by uid 99); 11 Mar 2011 00:55:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Mar 2011 00:55:24 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Mar 2011 00:55:21 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C07C13A3A77 for ; Fri, 11 Mar 2011 00:54:59 +0000 (UTC) Date: Fri, 11 Mar 2011 00:54:59 +0000 (UTC) From: "Jean-Daniel Cryans (JIRA)" To: issues@hbase.apache.org Message-ID: <2103166449.12373.1299804899785.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <213733735.12328.1299803759427.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] Commented: (HBASE-3621) The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005438#comment-13005438 ] Jean-Daniel Cryans commented on HBASE-3621: ------------------------------------------- For example: {code} "somenode.prod.twitter.com:60000.timeoutMonitor" daemon prio=10 tid=0x00002aacb8567800 nid=0x772 in Object.wait() [0x0000000045bf1000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757) - locked <0x00002aaab2a10da8> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) at $Proxy6.closeRegion(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1093) at org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1672) - locked <0x00002aaabf759858> (a java.util.concurrent.ConcurrentSkipListMap) at org.apache.hadoop.hbase.Chore.run(Chore.java:66 ... "main-EventThread" daemon prio=10 tid=0x00002aacb850b000 nid=0x761 waiting for monitor entry [0x00000000455eb000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525) - waiting to lock <0x00002aaabf759858> (a java.util.concurrent.ConcurrentSkipListMap) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) {code} The ZK event thread is blocked by that other thread that talks to a RS that doesn't answer. All ZK events get severely delayed. > The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no > ------------------------------------------------------------------------------------------- > > Key: HBASE-3621 > URL: https://issues.apache.org/jira/browse/HBASE-3621 > Project: HBase > Issue Type: Bug > Reporter: stack > Fix For: 0.90.2 > > > J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition). Fix. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira