zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Flavio Junqueira (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ZOOKEEPER-2466) Client skips servers when trying to connect
Date Tue, 05 Jul 2016 21:05:10 GMT
Flavio Junqueira created ZOOKEEPER-2466:
-------------------------------------------

             Summary: Client skips servers when trying to connect
                 Key: ZOOKEEPER-2466
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2466
             Project: ZooKeeper
          Issue Type: Bug
          Components: c client
            Reporter: Flavio Junqueira
            Assignee: Flavio Junqueira
            Priority: Critical
             Fix For: 3.5.3, 3.6.0


I've been looking at {{Zookeeper_simpleSystem::testFirstServerDown}} and I observed the following
behavior. The list of servers to connect contains two servers, let's call them S1 and S2.
The client never connects, but the odd bit is the sequence of servers that the client tries
to connect to:

{noformat}
S1
S2
S1
S1
S1
<keeps repeating S1>
{noformat}

It intrigued me that S2 is only tried once and never again. Checking the code, here is what
happens. Initially, {{zh->reconfig}} is 1, so in {{zoo_cycle_next_server}} we return an
address from {{get_next_server_in_reconfig}}, which is taken from {{zh->addrs_new}} in
this test case. The attempt to connect fails, and {{handle_error}} is invoked in the error
handling path. {{handle_error}} actually invokes {{addrvec_next}} which changes the address
pointer to the next server on the list.

After two attempts, it decides that it has tried all servers in {{zoo_cycle_next_server}}
and sets {{zh->reconfig}} to zero. Once {{zh->reconfig == 0}}, we have that each call
to {{zoo_cycle_next_server}} moves the address pointer to the next server in {{zh->addrs}}.
But, given that {{handle_error}} also moves the pointer to the next server, we end up moving
the pointer ahead twice upon every failed attempt to connect, which is wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message