zookeeper-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (Jira)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3510) Frequent 'zkServer.sh stop' failures when running C test suite
Date Fri, 23 Aug 2019 14:54:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914319#comment-16914319
] 

Hudson commented on ZOOKEEPER-3510:
-----------------------------------

SUCCESS: Integrated in Jenkins build Zookeeper-trunk-single-thread #506 (See [https://builds.apache.org/job/Zookeeper-trunk-single-thread/506/])
ZOOKEEPER-3510: Make 'zkServer.sh stop' more reliable (nkalmar: rev 942213dfe28e464f068f8a195d1424c4b29af585)
* (edit) bin/zkServer.sh


> Frequent 'zkServer.sh stop' failures when running C test suite
> --------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3510
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3510
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Damien Diederen
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.6.0, 3.5.6
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> As mentioned in https://github.com/apache/zookeeper/pull/1054#discussion_r314208678 :
> There is a {{sleep 3}} statement in {{zkServer.sh restart}}.  I am unable to unearth
the history of that particular line, but I believe part—if not all—of that {{sleep}} should
be part of {{zkServer.sh stop}}.
> I frequently observe {{FAILED TO START}} errors in the C test suite; the logs consistently
show that those are caused by {{java.net.BindException: Address already in use}}.  Adding
a simple {{sleep 1}} before {{echo STOPPED}} "fixes" it for me.  I will submit an initial
PR with the corresponding change and a commit message akin to:
> ----
> ZOOKEEPER-XXXX: Make zkServer.sh stop more reliable
> Kill is asynchronous, and without the sleep, the server's TCP port can still be busy
when the next server is started—causing flaky runs of the C client's test suite.
> (It would probably be better to spin a few times, probing with ps -p.)
> ----
> As noted above, the sleep is far from optimal, an adaptive mechanism would be better—but
I do not want to make the first iteration too complicated.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Mime
View raw message