hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Hunt (JIRA)" <j...@apache.org>
Subject [jira] Updated: (ZOOKEEPER-336) single bad client can cause server to stop accepting connections
Date Fri, 08 May 2009 20:52:45 GMT

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Patrick Hunt updated ZOOKEEPER-336:

    Status: Open  (was: Patch Available)

I noticed the following issues:

1) it's great to see tests ;-) however the included test needs to be improved a bit. Two issues:

a) under heavy load (like we see on hudson.zones.apache.org) the test may fail due to the
use of sleep.

We've found that using sleep in tests is a two fold problem. What happens is that usually
the test runs fine, however
under heavy load the sleep may not be long enough, resulting in invalid test failure. This
is particularly onerous
on hudson.zones.apache.org, which is a virtualized system and gates commits. A second reason
we generally
dont use sleep is that it artificially inflates the run time of the test, making agile/CI
based development painful
(granted this is something we need to improve on, ie test run time, and are working towards)

Instead of using sleep the test should first loop to start all the sockets, then loop to see
if the sockets are either
connected or failed to connect, counting the number of success and checking against expected.
The check should
wait for a socket result for some max period of time and if the time is exceeded (say 60sec)
then the test fails.

b) we should really test the zk client code to ensure they handle this correctly/gracefully.
It would be great to have
an additional Java and an additional C test that uses ZooKeeper objects and ensures that the
first N < max work
properly when N > max clients are started, and that the clients > max give proper watcher

2) this is a user externally visible change - the docs need to be updated. Hadoop uses forrest
for docs, zookeeper
in particular uses simpledocbook xml format for the doc source markup.

See the source docs in src/docs/src/documentation/content/xdocs

You can re-generate the documentation using forrest, which you can download from
Note: forrest requires jdk5 

I run forrest as follows:
PATH=$PATH:/home/phunt/dev/apache-forrest-0.8/bin JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun ant
-Dforrest.home=/home/phunt/dev/apache-forrest-0.8 -Djava5.home=/usr/lib/jvm/java-1.5.0-sun

the generated docs are put into trunk/docs. You probably want to put this into the admin/ops

3) in getClientCnxnCount there's no need for removeall (inefficient as well), just use an
int and count unclosed

4) in the updated patch I changed info to warning for the max connection exceeded. This happens
enough, and it's important enough to list as a warning.

5) Note: we have 80 character limit on line length. the updated patch fixes these issues.
This is a  rule
we break it in certain rare instances but in generl keep line length to 80chars.

if you use eclipse you can set general->editor->texteditor->showprintmargin, I find
this useful

> single bad client can cause server to stop accepting connections
> ----------------------------------------------------------------
>                 Key: ZOOKEEPER-336
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-336
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: c client, java client, server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>            Priority: Critical
>             Fix For: 3.2.0
>         Attachments: ZOOKEEPER-336.patch, ZOOKEEPER-336.patch, ZOOKEEPER-336.patch
> One user saw a case where a single mis-programmed client was overloading the server with
connections - the client was creating a huge number of sessions to the server. This caused
all of the fds on the  server to become used.
> Seems like we should have some way of limiting (configurable override) the maximum number
of sessions from a single client (say 10 by default?) Also we should output warnings when
this limit is exceeded (or attempt to exceed).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message