hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11460) Deadlock in HMaster on masterAndZKLock in HConnectionManager
Date Fri, 04 Jul 2014 14:24:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052475#comment-14052475
] 

Hadoop QA commented on HBASE-11460:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12653968/11460-v1.txt
  against trunk revision .
  ATTACHMENT ID: 12653968

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 4 new Findbugs (version
1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9970//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9970//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9970//console

This message is automatically generated.

> Deadlock in HMaster on masterAndZKLock in HConnectionManager
> ------------------------------------------------------------
>
>                 Key: HBASE-11460
>                 URL: https://issues.apache.org/jira/browse/HBASE-11460
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.96.0
>            Reporter: Andrey Stepachev
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.99.0
>
>         Attachments: 11460-v1.txt, threads.tdump
>
>
> On one of our clusters we got a deadlock in HMaster.
> In a nutshell deadlock caused by using one HConnectionManager for serving client-like
calls and calls from HMaster RPC handlers.
> HBaseAdmin uses HConnectionManager which takes a lock masterAndZKLock.
> On the other side of this game sits TablesNamespaceManager (TNM). This class uses HConnectionManager
too (in my case for getting list of available namespaces). 
> Problem is that HMaster class uses TNM  for serving RPC requests.
> If we look at TNM more closely, we can see, that this class is totally synchronised.
> Thats gives us a problem.
> WebInterface calls request via HConnectionManager and locks masterAndZKLock.
> Connection is blocking, so RpcClient will spin, awaiting for reply (while holding lock).
> That how it looks like in thread dump:
> {code}
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000000c8905430> (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
> 	at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
> 	- locked <0x00000000c8905430> (a org.apache.hadoop.hbase.ipc.RpcClient$Call)
> 	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
> 	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
> 	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1467)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2093)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1819)
> 	- locked <0x00000000d15dc668> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.client.HBaseAdmin$MasterCallable.prepare(HBaseAdmin.java:3187)
> 	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:119)
> 	- locked <0x00000000cd0c1238> (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
> 	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:96)
> 	- locked <0x00000000cd0c1238> (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
> 	at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3214)
> 	at org.apache.hadoop.hbase.client.HBaseAdmin.listTableDescriptorsByNamespace(HBaseAdmin.java:2265)
> {code}
> Some other client call any HMaster RPC, and it calls TablesNamespaceManager methods,
which in turn will block on HConnectionManager global lock masterAndZKLock.
> That how it looks like:
> {code}
>   java.lang.Thread.State: BLOCKED (on object monitor)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveZooKeeperWatcher(HConnectionManager.java:1699)
> 	- waiting to lock <0x00000000d15dc668> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.client.ZooKeeperRegistry.isTableOnlineState(ZooKeeperRegistry.java:100)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isTableDisabled(HConnectionManager.java:874)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:1027)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:852)
> 	at org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:72)
> 	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:119)
> 	- locked <0x00000000cd0ef108> (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
> 	at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:705)
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1102)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1162)
> 	- locked <0x00000000d1b49fd8> (a java.lang.Object)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1054)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1011)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:852)
> 	at org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:72)
> 	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:119)
> 	- locked <0x00000000cd0ef248> (a org.apache.hadoop.hbase.client.RpcRetryingCaller)
> 	at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
> 	at org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:134)
> 	at org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:118)
> 	- locked <0x00000000d189da20> (a org.apache.hadoop.hbase.master.TableNamespaceManager)
> 	at org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3113)
> 	at org.apache.hadoop.hbase.master.HMaster.listTableDescriptorsByNamespace(HMaster.java:3133)
> 	at org.apache.hadoop.hbase.master.HMaster.listTableDescriptorsByNamespace(HMaster.java:3034)
> 	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38261)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
> 	at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> {code}
> And finally original handler, which should serve request from WebGUI can be blocked on
TNM methods effectively forming dead lock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message