zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "james strachan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-63) Race condition in client close() operation
Date Thu, 24 Jul 2008 10:01:31 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616426#action_12616426

james strachan commented on ZOOKEEPER-63:

So this patch does not attempt to fix the race condition problem, apologies if I gave that
impression :)

What it does do though is act as a workaround so that if a client is not able to properly
send a disconnect packet to the server for *any reason at all* such as

* a hung socket (which can be quite common) 
* no servers available
* a race condition in the ZK client code of some kind (which we definitely have now)

to not hang the client application forever - as its trying to close and shut down anyway :).
So its a side benefit that it acts as a band aid until someone fixes all the possible race
conditions and potential socket hangs.

Let me put it another way. Given that the client is closing; is it really correct to leave
it potentially hanging around forever just because it cannot be sure if the disconnect packet
was received and properly processed by the server? If the socket is dead before the call to
close(), is it really correct to block until a connection can be re-established, just so it
can be properly closed - when the code will effectively close the hung socket without sending
a disconnect packet anyway :) ? 

The server has to detect and timeout failed sessions; whether it receives an explicit disconnect
packet or not (as a process could just hang). So do we really need to be super strict on the
client side, forcing clients to block when trying to shut down if they can't do so cleanly
within some time period?

I totally agree that we should fix the race condition though :). I just wanted a work around
to avoid my ZK test cases hanging forever due to the race condition :) 

> Race condition in client close() operation
> ------------------------------------------
>                 Key: ZOOKEEPER-63
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-63
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: java client
>            Reporter: Patrick Hunt
>            Assignee: Benjamin Reed
>         Attachments: patch_ZOOKEEPER-63.patch
> There is a race condition in the java close operation on ZooKeeper.java.
> Client is sending a disconnect request to the server. Server will close any open connections
with the client when it receives this. If the client has not yet shutdown it's subthreads
(event/send threads for example) these threads may consider the condition an error. We see
this alot in the tests where the clients output error logs because they are unaware that a
disconnection has been requested by the client.
> Ben mentioned: perhaps we just have to change state to closed (on client) before sending
disconnect request.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message