Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A7F8E200B35 for ; Tue, 5 Jul 2016 23:05:12 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A689A160A60; Tue, 5 Jul 2016 21:05:12 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EC6AD160A2C for ; Tue, 5 Jul 2016 23:05:11 +0200 (CEST) Received: (qmail 58261 invoked by uid 500); 5 Jul 2016 21:05:11 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 58249 invoked by uid 99); 5 Jul 2016 21:05:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jul 2016 21:05:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id E462C2C0003 for ; Tue, 5 Jul 2016 21:05:10 +0000 (UTC) Date: Tue, 5 Jul 2016 21:05:10 +0000 (UTC) From: "Flavio Junqueira (JIRA)" To: dev@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (ZOOKEEPER-2466) Client skips servers when trying to connect MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 05 Jul 2016 21:05:12 -0000 Flavio Junqueira created ZOOKEEPER-2466: ------------------------------------------- Summary: Client skips servers when trying to connect Key: ZOOKEEPER-2466 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2466 Project: ZooKeeper Issue Type: Bug Components: c client Reporter: Flavio Junqueira Assignee: Flavio Junqueira Priority: Critical Fix For: 3.5.3, 3.6.0 I've been looking at {{Zookeeper_simpleSystem::testFirstServerDown}} and I observed the following behavior. The list of servers to connect contains two servers, let's call them S1 and S2. The client never connects, but the odd bit is the sequence of servers that the client tries to connect to: {noformat} S1 S2 S1 S1 S1 {noformat} It intrigued me that S2 is only tried once and never again. Checking the code, here is what happens. Initially, {{zh->reconfig}} is 1, so in {{zoo_cycle_next_server}} we return an address from {{get_next_server_in_reconfig}}, which is taken from {{zh->addrs_new}} in this test case. The attempt to connect fails, and {{handle_error}} is invoked in the error handling path. {{handle_error}} actually invokes {{addrvec_next}} which changes the address pointer to the next server on the list. After two attempts, it decides that it has tried all servers in {{zoo_cycle_next_server}} and sets {{zh->reconfig}} to zero. Once {{zh->reconfig == 0}}, we have that each call to {{zoo_cycle_next_server}} moves the address pointer to the next server in {{zh->addrs}}. But, given that {{handle_error}} also moves the pointer to the next server, we end up moving the pointer ahead twice upon every failed attempt to connect, which is wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)