From user-return-12981-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Mon Nov 9 08:12:53 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 936D918065C for ; Mon, 9 Nov 2020 09:12:53 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id B687044896 for ; Mon, 9 Nov 2020 08:12:52 +0000 (UTC) Received: (qmail 14993 invoked by uid 500); 9 Nov 2020 08:12:51 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 14976 invoked by uid 99); 9 Nov 2020 08:12:51 -0000 Received: from spamproc1-he-fi.apache.org (HELO spamproc1-he-fi.apache.org) (95.217.134.168) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Nov 2020 08:12:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-fi.apache.org (ASF Mail Server at spamproc1-he-fi.apache.org) with ESMTP id 57EA4BFD6E for ; Mon, 9 Nov 2020 08:12:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-fi.apache.org X-Spam-Flag: NO X-Spam-Score: 0.249 X-Spam-Level: X-Spam-Status: No, score=0.249 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, KAM_LOTSOFHASH=0.25, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamproc1-he-fi.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([116.203.227.195]) by localhost (spamproc1-he-fi.apache.org [95.217.134.168]) (amavisd-new, port 10024) with ESMTP id hrigoZ7hrRMv for ; Mon, 9 Nov 2020 08:12:48 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.167.54; helo=mail-lf1-f54.google.com; envelope-from=szalay.beko.mate@gmail.com; receiver= Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id EE56CBC2A1 for ; Mon, 9 Nov 2020 08:12:47 +0000 (UTC) Received: by mail-lf1-f54.google.com with SMTP id v144so11090860lfa.13 for ; Mon, 09 Nov 2020 00:12:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=aA7WsJp3zsKKQpAehSFPCUehOK1hIpVP3mZfAbVrYsY=; b=b5CAfhz4HxxNcM+tRM7FUPtw9SVRxlcDRNcY8VhOQDeD5nvGt/lAer4PWVwoCE2fRP tfZabpG0JJ8Di2+72NLwPKvWcSlWihPnoHzYQibScgru/IDfb7wo3qXwx/8wcz9tEZFc EIQe8YJByyzaFqqJsLsm6Brx97YKlkTy89fVrenLhqlxGTc0S5+wQK9QNBccTXHdP5s1 gAxzGcKVj5fRLXVsCkJCZ0Qtu3l/mtvEDcTgn1xZ7r6xkU6qaADda3t1SSgPymT0hU+6 SE287flNycYqT6L2T6BkmC3qAbXNO1KyY7Sy+eRbKwTzVIEvUZ6800DGkM6Ro1/eMO3p KCIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=aA7WsJp3zsKKQpAehSFPCUehOK1hIpVP3mZfAbVrYsY=; b=TLnhPqbcpvHthqllKnBWTqQwW665ZHNnXTdh4gw9NdVXp3qRwuAT8G52wMsPwUTJ6d G96bd8bRVXm/TcZIrM10luOGcG0qGvYQfDAXw/25DJhkiCPeUKs3HQ9t9NIlFaFkP4Ua arjsnMs6nzvgpWvrJAUKGEHwdVDg+kGP8DE6PUD+vlMuGz5/j4/kq/LbuNYrLmrXY7kt nSFYuw+rYumRI69DILGvGZh9BTzcKpPTS5sAJLx9QrADLiXLhIHh7NJAdVeIkL1sHjE6 zFqasFYDeXaOoILD1LpxBYLjCj1kUv4kOJm0X5goJK3X7gxfNZbShRCKQKo/9m46+iJk K9Qw== X-Gm-Message-State: AOAM53046GxfLhqNaGpQfRKuCKmm8skq5K9NgJ/ewIZUrqPUE0KxaBo7 xqTgRCN0kbgxhX3F+GvTv2/i7qRY1K9xX7G6IenMH2WXKock X-Google-Smtp-Source: ABdhPJwNnSGF4N2co5B5Nx/827UbofXpLItY9Zb5aOmFjor+bgxNJ6SEQN+FYJYcKtY5uZDoEu2qQEbn7Y36eB3Rmkk= X-Received: by 2002:a19:745:: with SMTP id 66mr4436377lfh.343.1604909566459; Mon, 09 Nov 2020 00:12:46 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?B?U3phbGF5LUJla8WRIE3DoXTDqQ==?= Date: Mon, 9 Nov 2020 09:12:35 +0100 Message-ID: Subject: Re: zookeeper session issue with 3.5.x version To: UserZooKeeper Content-Type: multipart/alternative; boundary="0000000000009b034305b3a82035" --0000000000009b034305b3a82035 Content-Type: text/plain; charset="UTF-8" Hello Vik, This issue reminds me of https://issues.apache.org/jira/browse/ZOOKEEPER-3940 Can you doublecheck if you see the same issue? I think ZOOKEEPER-3940 is docker related. Are you using a dockerized ZooKeeper? If you have a different problem, then I recommend you to file a Jira ticket, attaching debug logs from all the 3 ZooKeeper server processes. Kind regards, Mate On Sat, Nov 7, 2020 at 9:28 PM vikramark s wrote: > Hi, > > I am relatively new to zookeeper and I am struggling to resolve an issue we > are experiencing. We have recently upgraded our zookeeper version from > 3.4.x to 3.5.8. We are experiencing some issues which we think are related > to session sharing among nodes. > > I was able to recreate the issue with a sample zookeeper setup. I am not > able to set up new session after taking down the leader in a 3 node > cluster. The same flow works with 3.4.14 zookeeper but not with 3.5.8. I am > hoping maybe there is some setting I am overlooking here as I don't find > anyone complaining about this online. > > Below are the details: > > 3 node cluster. After starting all the zoo nodes: > > Zoo1 > > Zoo2 > > Zoo3 > > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on > 05/04/2020 15:07 GMT > > Latency min/avg/max: 0/0/0 > > Received: 3 > > Sent: 2 > > Connections: 1 > > Outstanding: 0 > > Zxid: 0x0 > > Mode: follower > > Node count: 5 > > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on > 05/04/2020 15:07 GMT > > Latency min/avg/max: 0/0/0 > > Received: 3 > > Sent: 2 > > Connections: 1 > > Outstanding: 0 > > Zxid: 0x100000000 > > Mode: leader > > Node count: 5 > > Proposal sizes last/min/max: -1/-1/-1 > > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on > 05/04/2020 15:07 GMT > > Latency min/avg/max: 0/0/0 > > Received: 2 > > Sent: 1 > > Connections: 1 > > Outstanding: 0 > > Zxid: 0x100000000 > > Mode: follower > > Node count: 5 > > > > > > After starting one session using zkCli.sh on Zoo1 node: > > > > Zoo1 > > Zoo2 > > Zoo3 > > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on > 05/04/2020 15:07 GMT > > Latency min/avg/max: 1/9/23 > > Received: 7 > > Sent: 6 > > Connections: 2 > > Outstanding: 0 > > Zxid: 0x100000001 > > Mode: follower > > Node count: 5 > > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on > 05/04/2020 15:07 GMT > > Latency min/avg/max: 0/0/0 > > Received: 4 > > Sent: 3 > > Connections: 1 > > Outstanding: 0 > > Zxid: 0x100000001 > > Mode: leader > > Node count: 5 > > Proposal sizes last/min/max: 36/36/36 > > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on > 05/04/2020 15:07 GMT > > Latency min/avg/max: 0/0/0 > > Received: 3 > > Sent: 2 > > Connections: 1 > > Outstanding: 0 > > Zxid: 0x100000001 > > Mode: follower > > Node count: 5 > > > > > > *Note: We can see that Zxid is now consistent across all nodes. * > > > > I then shut down leader node zoo2. I can see ZOO3 became the Leader. But > for some reason the ZXID is not the same between zoo1 and zoo3. > > > > Now closed the existing zkCli and started a new zkCli.sh session on the > same node (zoo1). The session was not established, the cli client just > keeps retrying and created many outstanding requests on zoo1. The only way > to resolve now is to shut down all nodes and restart them again. > (Currently, if the leader node goes down, our kafka cluster stops working. > ) > > > > Zoo1 > > Zoo2 > > Zoo3 > > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on > 05/04/2020 15:07 GMT > > Latency min/avg/max: 0/0/2 > > Received: 50 > > Sent: 43 > > Connections: 2 > > Outstanding: 6 > > Zxid: 0x100000001 > > Mode: follower > > Node count: 5 > > down > > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on > 05/04/2020 15:07 GMT > > Latency min/avg/max: 0/0/0 > > Received: 1 > > Sent: 0 > > Connections: 1 > > Outstanding: 0 > > Zxid: 0x200000000 > > Mode: leader > > Node count: 5 > > Proposal sizes last/min/max: -1/-1/-1 > > > > *Question: Why is the client not able to establish the session on Zoo1 ? * > > > > > > But a similar flow with zookeeper 3.4.14 works fine. Below is the detail: > > > > First initial setup: > > > > Zoo1 > > Zoo2 > > Zoo3 > > Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built > on 03/06/2019 16:18 GMT > > Latency min/avg/max: 0/0/0 > > Received: 1 > > Sent: 0 > > Connections: 1 > > Outstanding: 0 > > Zxid: 0x0 > > Mode: follower > > Node count: 4 > > Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built > on 03/06/2019 16:18 GMT > > Latency min/avg/max: 0/0/0 > > Received: 1 > > Sent: 0 > > Connections: 1 > > Outstanding: 0 > > Zxid: 0x100000000 > > Mode: leader > > Node count: 4 > > Proposal sizes last/min/max: -1/-1/-1 > > Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built > on 03/06/2019 16:18 GMT > > Latency min/avg/max: 0/0/0 > > Received: 1 > > Sent: 0 > > Connections: 1 > > Outstanding: 0 > > Zxid: 0x100000000 > > Mode: follower > > Node count: 4 > > > > After connecting with zkCli on ZOO1. > > > > Zoo1 > > Zoo2 > > Zoo3 > > Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built > on 03/06/2019 16:18 GMT > > Latency min/avg/max: 0/14/33 > > Received: 5 > > Sent: 4 > > Connections: 2 > > Outstanding: 0 > > Zxid: 0x100000001 > > Mode: follower > > Node count: 4 > > Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built > on 03/06/2019 16:18 GMT > > Latency min/avg/max: 0/0/0 > > Received: 2 > > Sent: 1 > > Connections: 1 > > Outstanding: 0 > > Zxid: 0x100000001 > > Mode: leader > > Node count: 4 > > Proposal sizes last/min/max: 36/36/36 > > Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built > on 03/06/2019 16:18 GMT > > Latency min/avg/max: 0/0/0 > > Received: 2 > > Sent: 1 > > Connections: 1 > > Outstanding: 0 > > Zxid: 0x100000001 > > Mode: follower > > Node count: 4 > > > > *Note: The zkid is now the same for all the nodes. * > > > > > > After shutting down leader node zoo2, I can see Zoo3 became the Leader. For > some reason the ZXID is not same between zoo1 and zoo3 initially. Zoo3 has > new zkid as a new epoch was created but zoo1 still has an old zkid. > > > > I closed the existing zxcli and started a new zkCli.sh session on the same > node (zoo1). This time session was established and the zkid was synced as > well. > > > > > > Zoo1 > > Zoo2 > > Zoo3 > > Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built > on 03/06/2019 16:18 GMT > > Latency min/avg/max: 0/1/4 > > Received: 8 > > Sent: 7 > > Connections: 2 > > Outstanding: 0 > > Zxid: 0x200000001 > > Mode: follower > > Node count: 4 > > down > > > > Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built > on 03/06/2019 16:18 GMT > > Latency min/avg/max: 0/0/0 > > Received: 3 > > Sent: 2 > > Connections: 1 > > Outstanding: 0 > > Zxid: 0x200000001 > > Mode: leader > > Node count: 4 > > Proposal sizes last/min/max: 36/36/36 > > > > Any help with this issue will be greatly appreciated! > > -- > Vik > --0000000000009b034305b3a82035--