zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bogdan Kanivets (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ZOOKEEPER-2916) startSingleServerTest may be flaky
Date Sun, 26 Nov 2017 18:46:01 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266130#comment-16266130
] 

Bogdan Kanivets edited comment on ZOOKEEPER-2916 at 11/26/17 6:45 PM:
----------------------------------------------------------------------

I don't have the solution yet, but when comparing successful and failed runs the problem seems
to be around leader election after 

{code:java}
startObservers(observerStrings);
testReconfig(follower2, true, reconfigServers); //add partcipants
testReconfig(follower2, true, observerStrings); //change to observers
{code}

Observers here are started as participants and take part in election, but later they are converted
to observers

Looking at failed run
https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1701/

When I filter by assigned ports 

{code:java}
grep "1122[2-9]\|1123[0-6]" consoleFull-jdk7.html
{code}


after the second observer is up:
[junit] 2017-11-16 19:53:26,403 [myid:4] - INFO  [Thread-11:NIOServerCnxnFactory@686] - binding
to port localhost/127.0.0.1:11234

there will be only one "Restarting Leader Election":
[junit] 2017-11-16 19:53:26,737 [myid:3] - WARN  [QuorumPeer[myid=3](plain=/127.0.0.1:11231)(secure=disabled):QuorumPeer@1427]
- Restarting Leader Election
then 20s later 
[junit] 2017-11-16 19:53:46,715 [myid:3] - WARN  [localhost/127.0.0.1:11233:QuorumCnxManager@348]
- Exception reading or writing challenge: java.net.SocketTimeoutException: Read timed out

On the successful run:
https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1702

{code:java}
grep "2738[0-9]\|2739[0-4]" consoleFull-jdk7-success.html
{code}


after second observer start:
[junit] 2017-11-17 20:18:40,311 [myid:4] - INFO  [Thread-11:NIOServerCnxnFactory@686] - binding
to port localhost/127.0.0.1:27392

There are leader election restarts from two peers
[junit] 2017-11-17 20:18:43,891 [myid:4] - WARN  [QuorumPeer[myid=4](plain=/127.0.0.1:27392)(secure=disabled):QuorumPeer@1427]
- Restarting Leader Election
[junit] 2017-11-17 20:18:43,894 [myid:3] - WARN  [QuorumPeer[myid=3](plain=/127.0.0.1:27389)(secure=disabled):QuorumPeer@1427]
- Restarting Leader Election

There is no "Read timed out", and test is done after 3s
[junit] 2017-11-17 20:18:46,133 [myid:] - INFO  [main:StandaloneDisabledTest@114] - Configuration
after adding two observers:
[junit] server.2=localhost:27387:27388:participant;localhost:27386
[junit] server.3=localhost:27390:27391:observer;localhost:27389
[junit] server.4=localhost:27393:27394:observer;localhost:27392




was (Author: bkanivets):
I don't have the solution yet, but when comparing successful and failed runs the problem seems
to be around leader election after 

{code:java}
startObservers(observerStrings);
testReconfig(follower2, true, reconfigServers); //add partcipants
testReconfig(follower2, true, observerStrings); //change to observers
{code}

Observers here are started as participants and take part in election, but later they are converted
to observers

Looking at failed run
https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1701/

When I filter by assigned ports 

{code:java}
grep "1122[2-9]\|1123[0-6]" consoleFull-jdk7.html
{code}


after the second observer is up:
[junit] 2017-11-16 19:53:26,403 [myid:4] - INFO  [Thread-11:NIOServerCnxnFactory@686] - binding
to port localhost/127.0.0.1:11234

there will be only one "Restarting Leader Election":
[junit] 2017-11-16 19:53:26,737 [myid:3] - WARN  [QuorumPeer[myid=3](plain=/127.0.0.1:11231)(secure=disabled):QuorumPeer@1427]
- Restarting Leader Election
then 20s later 
[junit] 2017-11-16 19:53:46,715 [myid:3] - WARN  [localhost/127.0.0.1:11233:QuorumCnxManager@348]
- Exception reading or writing challenge: java.net.SocketTimeoutException: Read timed out

On the successful run:
https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1702
grep "2738[0-9]\|2739[0-4]" consoleFull-jdk7-success.html

after second observer start:
[junit] 2017-11-17 20:18:40,311 [myid:4] - INFO  [Thread-11:NIOServerCnxnFactory@686] - binding
to port localhost/127.0.0.1:27392

There are leader election restarts from two peers
[junit] 2017-11-17 20:18:43,891 [myid:4] - WARN  [QuorumPeer[myid=4](plain=/127.0.0.1:27392)(secure=disabled):QuorumPeer@1427]
- Restarting Leader Election
[junit] 2017-11-17 20:18:43,894 [myid:3] - WARN  [QuorumPeer[myid=3](plain=/127.0.0.1:27389)(secure=disabled):QuorumPeer@1427]
- Restarting Leader Election

There is no "Read timed out", and test is done after 3s
[junit] 2017-11-17 20:18:46,133 [myid:] - INFO  [main:StandaloneDisabledTest@114] - Configuration
after adding two observers:
[junit] server.2=localhost:27387:27388:participant;localhost:27386
[junit] server.3=localhost:27390:27391:observer;localhost:27389
[junit] server.4=localhost:27393:27394:observer;localhost:27392



> startSingleServerTest may be flaky
> ----------------------------------
>
>                 Key: ZOOKEEPER-2916
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2916
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: tests
>    Affects Versions: 3.5.3, 3.6.0
>            Reporter: Patrick Hunt
>            Assignee: Bogdan Kanivets
>              Labels: newbie
>
> startSingleServerTest seems to be failing intermittently. 10 times in the first few days
of this month. Can someone take a look?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message