Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E3150200C39 for ; Thu, 16 Mar 2017 19:10:33 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E1DF8160B78; Thu, 16 Mar 2017 18:10:33 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 38A27160B72 for ; Thu, 16 Mar 2017 19:10:33 +0100 (CET) Received: (qmail 96476 invoked by uid 500); 16 Mar 2017 18:10:32 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 96465 invoked by uid 99); 16 Mar 2017 18:10:32 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Mar 2017 18:10:32 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 01ACDDFDC5; Thu, 16 Mar 2017 18:10:31 +0000 (UTC) From: hanm To: dev@zookeeper.apache.org Reply-To: dev@zookeeper.apache.org References: In-Reply-To: Subject: [GitHub] zookeeper issue #191: ZOOKEEPER-2722: fix flaky testSessionEstablishment tes... Content-Type: text/plain Message-Id: <20170316181032.01ACDDFDC5@git1-us-west.apache.org> Date: Thu, 16 Mar 2017 18:10:32 +0000 (UTC) archived-at: Thu, 16 Mar 2017 18:10:34 -0000 Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/191 >> if that creation is failing due to connection loss, shouldn't the places that check the watcher connection fail there instead of in your check? ConnectionLossException can happen *after* a connection between ZooKeeper client and server has been established, right? So having the check only in watcher is not enough. A pass in watcher does not guarantee ConnectionLossException will not occur in a later point in time. Imagine an extreme case where the a network partition happened between client / server after a session establishment - the client will first get a connected event, and watcher happily reports everything is fine, then subsequent operation (e.g. create) will fail with ConnectionLossException until the network healed. >> I think it's worth understanding why we are getting a connection event in the watcher that should be waiting for connection, but still failing by not connecting, instead of fixing this with additional waiting. Yes I'd like to know what causes this though I had a hard time to reproduce this failure locally / in internal Jenkins. It is so far only reproducible in Apache Jenkins. I can add some loggings to capture more contexts when the failure happens on Apache Jenkins, but in that case the retry logic in create is still needed, unless we can prove it is not possible to get a ConnectionLossException after a session establishment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---