zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Flavio Paiva Junqueira (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-140) Deadlock in QuorumCnxManager
Date Sat, 13 Sep 2008 15:44:44 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630783#action_12630783
] 

Flavio Paiva Junqueira commented on ZOOKEEPER-140:
--------------------------------------------------

It seems to me that there are two unnecessary synchronized blocks: one on sendTo() for the
call to initiateConnection, and second upon a new connection and subsequent call to receiveConnection.
Both methods synchronize again on senderWorkerMap when it is time to update the bookkeeping
information on the connections. By removing these two, we prevent the problem pointed out
in this jira. I have tested, and it seems to work, and logic also seems to work to me.

I will postpone submitting a patch because I'd like to have a patch for 127 reviewed and committed
first. 

> Deadlock in QuorumCnxManager
> ----------------------------
>
>                 Key: ZOOKEEPER-140
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-140
>             Project: Zookeeper
>          Issue Type: Bug
>            Reporter: Flavio Paiva Junqueira
>
> Frequently the servers deadlock in QuorumCnxManager:initiateConnection on
> s.read(msgBuffer) when reading the challenge from the peer.
> Calls to initiateConnection and receiveConnection are synchronized, so only one or the
other can be executing at a time. This prevents two connections from opening between the same
pair of servers.
> However, it seems that this leads to deadlock, as in this scenario:
> {noformat}
> A (initiate --> B)
> B (initiate --> C)
> C (initiate --> A)
> {noformat}
> initiateConnection can only complete when receiveConnection runs on the remote peer and
answers the challenge. If all servers are blocked in initiateConnection, receiveConnection
never runs and leader election halts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message