zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Wiedmann (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-542) c-client can spin when server unresponsive
Date Tue, 06 Oct 2009 18:06:31 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762713#action_12762713
] 

Christian Wiedmann commented on ZOOKEEPER-542:
----------------------------------------------

I don't really know how to do an automated test for this, since the spinning is not visible
outside of the API.  The manual test I used is to kill -STOP the server and then wait until
the client tries to reconnect while running strace on the I/O thread (I'm using python bindings,
btw).  Pre-patch the strace shows repeated calls to poll, with POLLOUT set on the server fd.
 Post-patch, POLLOUT is not set, and there is no spinning.

> c-client can spin when server unresponsive
> ------------------------------------------
>
>                 Key: ZOOKEEPER-542
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-542
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client
>    Affects Versions: 3.2.0, 3.2.1
>            Reporter: Christian Wiedmann
>            Assignee: Christian Wiedmann
>             Fix For: 3.3.0
>
>         Attachments: ZOOKEEPER-542.patch, ZOOKEEPER-542.patch
>
>
> Due to a mismatch between zookeeper_interest() and zookeeper_process(), when the zookeeper
server is unresponsive the client can spin when reconnecting to the server.
> In particular, zookeeper_interest() adds ZOOKEEPER_WRITE whenever there is data to be
sent, but flush_send_queue() only writes the data if the state is ZOO_CONNECTED_STATE.  When
in ZOO_ASSOCIATING_STATE, this results in spinning.
> This probably doesn't affect production, but I had a runaway process in a development
deployment that caused performance issues on the node.  This is easy to reproduce in a single
node environment by doing a kill -STOP on the server and waiting for the session timeout.
> Patch to be added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message