zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2466) Client skips servers when trying to connect
Date Thu, 10 May 2018 20:47:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471086#comment-16471086

Hadoop QA commented on ZOOKEEPER-2466:

-1 overall.  Here are the results of testing the latest attachment 
  against trunk revision 2fa315b7d0ed65828479fcdcc9e76ca8552fba4a.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3681//console

This message is automatically generated.

> Client skips servers when trying to connect
> -------------------------------------------
>                 Key: ZOOKEEPER-2466
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2466
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Flavio Junqueira
>            Assignee: Michael Han
>            Priority: Critical
>             Fix For: 3.6.0, 3.5.5
>         Attachments: ZOOKEEPER-2466.patch, ZOOKEEPER-2466.patch
> I've been looking at {{Zookeeper_simpleSystem::testFirstServerDown}} and I observed the
following behavior. The list of servers to connect contains two servers, let's call them S1
and S2. The client never connects, but the odd bit is the sequence of servers that the client
tries to connect to:
> {noformat}
> S1
> S2
> S1
> S1
> S1
> <keeps repeating S1>
> {noformat}
> It intrigued me that S2 is only tried once and never again. Checking the code, here is
what happens. Initially, {{zh->reconfig}} is 1, so in {{zoo_cycle_next_server}} we return
an address from {{get_next_server_in_reconfig}}, which is taken from {{zh->addrs_new}}
in this test case. The attempt to connect fails, and {{handle_error}} is invoked in the error
handling path. {{handle_error}} actually invokes {{addrvec_next}} which changes the address
pointer to the next server on the list.
> After two attempts, it decides that it has tried all servers in {{zoo_cycle_next_server}}
and sets {{zh->reconfig}} to zero. Once {{zh->reconfig == 0}}, we have that each call
to {{zoo_cycle_next_server}} moves the address pointer to the next server in {{zh->addrs}}.
But, given that {{handle_error}} also moves the pointer to the next server, we end up moving
the pointer ahead twice upon every failed attempt to connect, which is wrong.

This message was sent by Atlassian JIRA

View raw message