hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14241) Fix deadlock during cluster shutdown due to concurrent connection close
Date Thu, 20 Aug 2015 12:47:49 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704799#comment-14704799
] 

Hudson commented on HBASE-14241:
--------------------------------

FAILURE: Integrated in HBase-1.2 #120 (See [https://builds.apache.org/job/HBase-1.2/120/])
HBASE-14241 Fix deadlock during cluster shutdown due to concurrent connection close (tedyu:
rev 639018a857a5e58f56d1db45e3f2d0e6043e2650)
* hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClientImpl.java


> Fix deadlock during cluster shutdown due to concurrent connection close
> -----------------------------------------------------------------------
>
>                 Key: HBASE-14241
>                 URL: https://issues.apache.org/jira/browse/HBASE-14241
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.0.2
>            Reporter: Andrew Purtell
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.2, 1.3.0
>
>         Attachments: 14241-v2.txt, 14241-v3.txt, 14241-v4.txt, 14241-v5.txt, deadlock.txt.gz
>
>
> Caught while testing branch-1.0, shutting down TestMasterMetricsWrapper.
> Found one Java-level deadlock:
> =============================
> "MASTER_META_SERVER_OPERATIONS-ip-10-32-130-237:55342-0":
>   waiting to lock monitor 0x00007f2a040051c8 (object 0x00000007e36108a8, a org.apache.hadoop.hbase.util.PoolMap),
>   which is held by "M:0;ip-10-32-130-237:55342"
> "M:0;ip-10-32-130-237:55342":
>   waiting to lock monitor 0x00007f2a04005118 (object 0x00000007e3610b00, a org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection),
>   which is held by "MASTER_META_SERVER_OPERATIONS-ip-10-32-130-237:55342-0"
> Full stack dump and deadlock debug output attached.
> Root cause:
> In RpcClientImpl#close(), we obtain lock on connections first:
> {code}
>     synchronized (connections) {
>       for (Connection conn : connections.values()) {
> {code}
> Then markClosed() tries to obtain lock on connection object:
> {code}
>         if (!conn.isAlive()) {
>           conn.markClosed(new InterruptedIOException("RpcClient is closing"));
>           conn.close();
> {code}
> Another thread, MetaServerShutdownHandler, calls RpcClientImpl$Connection#setupIOstreams()
where :
> {code}
>         markClosed(e);
>         close();
> {code}
> Lock on connection object is obtained first, then lock on connections is attempted, leading
to deadlock:
> {code}
>       synchronized (connections) {
>         connections.removeValue(remoteId, this);
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message