zookeeper-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Damien Diederen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ZOOKEEPER-3510) Frequent 'zkServer.sh stop' failures when running C test suite
Date Thu, 15 Aug 2019 08:19:00 GMT
Damien Diederen created ZOOKEEPER-3510:
------------------------------------------

             Summary: Frequent 'zkServer.sh stop' failures when running C test suite
                 Key: ZOOKEEPER-3510
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3510
             Project: ZooKeeper
          Issue Type: Bug
            Reporter: Damien Diederen


As mentioned in https://github.com/apache/zookeeper/pull/1054#discussion_r314208678 :

There is a {{sleep 3}} statement in {{zkServer.sh restart}}.  I am unable to unearth the history
of that particular line, but I believe part—if not all—of that {{sleep}} should be part
of {{zkServer.sh stop}}.

I frequently observe {{FAILED TO START}} errors in the C test suite; the logs consistently
show that those are caused by {{java.net.BindException: Address already in use}}.  Adding
a simple {{sleep 1}} before {{echo STOPPED}} "fixes" it for me.  I will submit an initial
PR with the corresponding change and a commit message akin to:

----

ZOOKEEPER-XXXX: Make zkServer.sh stop more reliable

Kill is asynchronous, and without the sleep, the server's TCP port can still be busy when
the next server is started—causing flaky runs of the C client's test suite.

(It would probably be better to spin a few times, probing with ps -p.)

----

As noted above, the sleep is far from optimal, an adaptive mechanism would be better—but
I do not want to make the first iteration too complicated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message