zookeeper-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enrico Olivelli (Jira)" <j...@apache.org>
Subject [jira] [Updated] (ZOOKEEPER-1856) zookeeper C-client can fail to switch from a dead server in a 3+ server ensemble if the client only has a 2 server list.
Date Fri, 06 Sep 2019 15:43:09 GMT

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Enrico Olivelli updated ZOOKEEPER-1856:
---------------------------------------
    Fix Version/s: 3.5.7

> zookeeper C-client can fail to switch from a dead server in a 3+ server ensemble if the
client only has a 2 server list.
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1856
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1856
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Dutch T. Meyer
>            Assignee: Michi Mutsuzaki
>            Priority: Major
>             Fix For: 3.6.0, 3.5.6, 3.5.7
>
>         Attachments: ZOOKEEPER-1856.patch
>
>
> If a client has a 2 server list, and is currently connected to the last server in that
list, and that server then goes offline, the addrvec_next() call handle_error() will push
the client to the start of the list and terminate the connection.
> Then, the zoo_cycle_next_server() call in zookeeper_interest will be called in response
to the connection failure, and the client will cycle back to the failed server.
> In this way, a client who has a list of only 2 servers can get stuck on the one failed
server.  This would only be an issue in an ensemble larger than 2 of course, because failing
1 out of 2 would lead to quorum loss anyway.
> There are other harmonics possible if every other server in the list is failed, but this
is simplest to reproduce in a 3 server ensemble where the client only knows about 2 servers,
one of which then fails.  There are probably some elegant fixes here, but I think the simplest
is to add a flag to track whether a server has been accessed before, and if it hasn't, don't
call zoo_cycle_next_server() at the top of the zookeeper_interest() function.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Mime
View raw message