hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3621) The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
Date Fri, 11 Mar 2011 00:54:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005438#comment-13005438
] 

Jean-Daniel Cryans commented on HBASE-3621:
-------------------------------------------

For example:

{code}
"somenode.prod.twitter.com:60000.timeoutMonitor" daemon prio=10 tid=0x00002aacb8567800 nid=0x772
in Object.wait() [0x0000000045bf1000]
   java.lang.Thread.State: WAITING (on object monitor)
  at java.lang.Object.wait(Native Method)
  at java.lang.Object.wait(Object.java:485)
  at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
  - locked <0x00002aaab2a10da8> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
  at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
  at $Proxy6.closeRegion(Unknown Source)
  at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589)
  at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1093)
  at org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1672)
  - locked <0x00002aaabf759858> (a java.util.concurrent.ConcurrentSkipListMap)
  at org.apache.hadoop.hbase.Chore.run(Chore.java:66
...

"main-EventThread" daemon prio=10 tid=0x00002aacb850b000 nid=0x761 waiting for monitor entry
[0x00000000455eb000]
   java.lang.Thread.State: BLOCKED (on object monitor)
  at org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525)
  - waiting to lock <0x00002aaabf759858> (a java.util.concurrent.ConcurrentSkipListMap)
  at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268)
  at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
{code}

The ZK event thread is blocked by that other thread that talks to a RS that doesn't answer.
All ZK events get severely delayed.

> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big
no-no
> -------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3621
>                 URL: https://issues.apache.org/jira/browse/HBASE-3621
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.90.2
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition).
 Fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message