From user-return-10775-apmail-zookeeper-user-archive=zookeeper.apache.org@zookeeper.apache.org Thu Mar 2 14:48:42 2017 Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5349C193B7 for ; Thu, 2 Mar 2017 14:48:42 +0000 (UTC) Received: (qmail 1094 invoked by uid 500); 2 Mar 2017 14:48:41 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 1028 invoked by uid 500); 2 Mar 2017 14:48:41 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 1013 invoked by uid 99); 2 Mar 2017 14:48:41 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Mar 2017 14:48:41 +0000 Received: from mail-ua0-f176.google.com (mail-ua0-f176.google.com [209.85.217.176]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 257CF1A002B for ; Thu, 2 Mar 2017 14:48:41 +0000 (UTC) Received: by mail-ua0-f176.google.com with SMTP id c11so29546404uaa.0 for ; Thu, 02 Mar 2017 06:48:41 -0800 (PST) X-Gm-Message-State: AMke39lQTw/zMXOq8WBWVCChEbvwkt41jrLjeganxU5FqsydF5rAOJbiKOB/wy1M3nZenM6FKGdoabTFHKKLEA== X-Received: by 10.176.16.193 with SMTP id x1mr6720648uab.45.1488466118367; Thu, 02 Mar 2017 06:48:38 -0800 (PST) MIME-Version: 1.0 Received: by 10.31.58.142 with HTTP; Thu, 2 Mar 2017 06:48:37 -0800 (PST) In-Reply-To: References: From: Rakesh Radhakrishnan Date: Thu, 2 Mar 2017 20:18:37 +0530 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Sessions Expire due to Network partitioning in Zookeeper To: "user@zookeeper.apache.org" Content-Type: multipart/alternative; boundary=f40304361d8a3ee8a10549c086b5 --f40304361d8a3ee8a10549c086b5 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable >>> You mentioned that a client sends a ping every 1/3 the session timeout. Yes, you are correct. Again to analyse your issue, we have to consider re-connection timeout also, which is "sessiontimeout/listed servers count" https://github.com/apache/zookeeper/blob/branch-3.4/src/java/main/org/apach= e/zookeeper/ClientCnxn.java#L1292 https://github.com/apache/zookeeper/blob/branch-3.4/src/java/main/org/apach= e/zookeeper/ClientCnxn.java#L1098 Coincidentally, in your example both heartbeat interval and re-connection interval are same as you have three servers. >>>>> It looks like C3 has taken 14 seconds to determine the disconnected event >>>>> and another 14 seconds to that it cannot connect to Server B(C3 is isolated >>>>> from B). With this info, total elapsed time is 28 secs which is less than 45 secs session timeout. Now, the client has 17 secs (45 secs - 28 secs) time period to re-establish a connection with server A, right? Could you please check whether the client is connecting to A during this period? Rakesh On Thu, Mar 2, 2017 at 6:58 PM, Tharindu Kumara wrote: > =E2=80=8B=E2=80=8B > Hi Rakesh, > > First of all thank you for the quick reply. > > >>>>> Actually, ZooKeeper client has retry mechanism. > >>>>> Client sends a ping every 1/3 the session timeout (here, 3 is the n= o. > of listed servers, A, B, C) and then looks for a response before another > 1/3 elapses. This allows time to reconnect to a different server (and sti= ll > maintain the session) if the connected server becomes unavailable. > > You mentioned that a client sends a ping every 1/3 the session timeout. A= nd > 3 is the no of listed servers. > > I doubt that. Because, I am using the C Binding and after inspecting the > code it looks like that 3 is a hard coded value. > Simply no matter what the number of clients, zk client biding is always > sending a ping every 1/3 session timeout. > > Can please clarify that for me? > > Here I used a tick of 3000ms and session expiration timeout of 45000ms. > > And please find the screenshot of extacted client log outout. > > https://anonimag.es/image/JT9htnL > > It looks like C3 has taken 14 seconds to determine the disconnected event > and another 14 seconds to that it cannot connect to Server B(C3 is isolat= ed > from B). > > > > On Thu, Mar 2, 2017 at 4:08 PM, Rakesh Radhakrishnan > wrote: > > > >>>> According to my understanding, it looks like, when a client trying > to > > >>>> connect to a server that it cannot connect due to a network > > partitioning, > > >>>> it uses a blocking call and it waits too much time trying to > > >>>> connect to a server that it cannot communicate. > > > > Actually, ZooKeeper client has retry mechanism. > > Client sends a ping every 1/3 the session timeout (here, 3 is the no. o= f > > listed servers, A, B, C) > > and then looks for a response before another 1/3 elapses. This allows > time > > to reconnect to a > > different server (and still maintain the session) if the connected serv= er > > becomes unavailable. > > > > Could you grep the following log message in your client log and tell me > how > > much time C3 taken for the re-connection attempts. > > "Client session timed out, have not heard from server in " > > > > C3 might have first attempted to reconnect to B and then A. Also, need = to > > check how much time C3 taken to detect connection failure from server C= . > > > > Could you please share the zk client log to dig more. > > > > Rakesh > > > > > > On Thu, Mar 2, 2017 at 11:04 AM, Tharindu Kumara < > > zonik.hatkumara@gmail.com> > > wrote: > > > > > > =E2=80=8B > > > 1) Could you tell me the status of Server C, is this lost connection = to > > the > > > > quorum and fails to join quorum continuously as B is the Leade= r > ? > > > > > > Yes, B the leader. C Server is completely isolated from the Leader(B) > > > and It cannot communicate with the Leader. C cannot continuously > connect > > to > > > the > > > > > > Leader. > > > > > > > > > > 2) C3 is connected C. Please tell me the connection host string > passed > > > to > > > > this client. Does it contains all three servers info > > "A:clientport, > > > > B:clientport, C:clientport" ? > > > > > > Yes, C3's connection string contains all three servers. ("A:clientpor= t, > > > B:clientport, C:clientport") > > > > > > > > > > 3) Please check all three servers and client C3 logs to see any > > > > inconsistencies or exceptions. > > > > > > After looking at logs, it seems when the server C isolated from the > > Leader, > > > > > > a disconnect event fires to client C3. Then it (C3) tries too much ti= me > > to > > > connect to Server B(Leader) . > > > > > > But it cannot connect to server B, as we blocked the connection betwe= en > > > Server C and > > > > > > Server B. Basically, C3 tries more than half of the session timeout > time > > to > > > connect to Server B. > > > > > > Then after figuring out that C3 cannot to connect to Server B, it tri= es > > to > > > connect > > > > > > to Server A, and it connects to Server A successfully. But this is to= o > > > late, because > > > > > > session is already expired at the time C3 connected. > > > > > > And this happens sometimes only. Because when we specify all the > servers > > in > > > the client's > > > > > > connect string, sometimes after C3 disconnecting from Server C, inste= ad > > of > > > trying to connect to > > > > > > Server B it connects to Server A as the first attempt. In this case t= he > > > client C3 connects to the > > > > > > quorum successfully before the session expiration. > > > > > > According to my understanding, it looks like, when a client trying to > > > connect to a server that it cannot > > > > > > connect due to a network partitioning, it uses a blocking call and it > > waits > > > too much time trying to > > > > > > connect to a server that it cannot communicate. > > > > > > > > > > > > > 4) ZooKeeper version used in your testing ? > > > > > > I used zookeeper 3.4.9 (current stable release) > > > > > > > > > > > > On Thu, Mar 2, 2017 at 7:48 AM, Rakesh Radhakrishnan < > rakeshr@apache.org > > > > > > wrote: > > > > > > > Hi, > > > > > > > > Could you please give few more details, > > > > > > > > =E2=80=8B=E2=80=8B > > > > 1) Could you tell me the status of Server C, is this lost connectio= n > to > > > the > > > > quorum and fails to join quorum continuously as B is the Leader ? > > > > > > > > 2) C3 is connected C. Please tell me the connection host string > passed > > to > > > > this client. Does it contains all three servers info "A:clientport, > > > > B:clientport, C:clientport" ? > > > > > > > > 3) Please check all three servers and client C3 logs to see any > > > > inconsistencies or exceptions. > > > > > > > > 4) ZooKeeper version used in your testing ? > > > > > > > > > > > > Rakesh > > > > > > > > On Wed, Mar 1, 2017 at 4:55 PM, Tharindu Kumara < > > > zonik.hatkumara@gmail.com > > > > > > > > > wrote: > > > > > > > > > =E2=80=8BRecently, carried out a test to to find the behavior of = clients > > when a > > > > > client is partitioned from the ensemble. > > > > > > > > > > Here I used a ensemble of 3 zookeeper servers called A, B and C. > And > > > > quorum > > > > > was set up like below. > > > > > > > > > > A - Follower > > > > > B - Leader > > > > > C - Follower=E2=80=8B > > > > > > > > > > A <---> B <---> C > > > > > \____________/ > > > > > > > > > > And 3 clients are connected to ensemble like below. > > > > > > > > > > C1 is connected A > > > > > C2 is connected B > > > > > C3 is connected C. > > > > > > > > > > I used iptables to remove the network link between B and C. > > > > > > > > > > command used: iptables -I INPUT -s 123.123.45.123 -j DROP > > > > > > > > > > After removing the link connections looks like below. > > > > > > > > > > A <----> B C > > > > > \____________/ > > > > > > > > > > Simply there is no way to communicate from B to C and vice versa. > > > > > > > > > > Here What I noticed is that the client connected to Zookeeper > Server > > > "C", > > > > > could not connect to the ensemble resulting a session expiration > > > timeout. > > > > > > > > > > For this experiment I used tickTime of 3000ms and client session > > > > expiration > > > > > timeout of 45000ms. And tested with different combinations also. > > > > > > > > > > Can someone please explain what is the root cause for this > behavior? > > > > > > > > > > > > > > > --f40304361d8a3ee8a10549c086b5--