zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Szalay-Bekő Máté <szalay.beko.m...@gmail.com>
Subject Re: zookeeper session issue with 3.5.x version
Date Mon, 09 Nov 2020 08:12:35 GMT
Hello Vik,

This issue reminds me of
https://issues.apache.org/jira/browse/ZOOKEEPER-3940
Can you doublecheck if you see the same issue? I think ZOOKEEPER-3940 is
docker related. Are you using a dockerized ZooKeeper?

If you have a different problem, then I recommend you to file a Jira
ticket, attaching debug logs from all the 3 ZooKeeper server processes.

Kind regards,
Mate

On Sat, Nov 7, 2020 at 9:28 PM vikramark s <vikramark.singh@gmail.com>
wrote:

> Hi,
>
> I am relatively new to zookeeper and I am struggling to resolve an issue we
> are experiencing. We have recently upgraded our zookeeper version from
> 3.4.x to 3.5.8. We are experiencing some issues which we think are related
> to session sharing among nodes.
>
> I was able to recreate the issue with a sample zookeeper setup. I am not
> able to set up new session after taking down the leader in a 3 node
> cluster. The same flow works with 3.4.14 zookeeper but not with 3.5.8. I am
> hoping maybe there is some setting I am overlooking here as I don't find
> anyone complaining about this online.
>
> Below are the details:
>
> 3 node cluster. After starting all the zoo nodes:
>
> Zoo1
>
> Zoo2
>
> Zoo3
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 3
>
> Sent: 2
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x0
>
> Mode: follower
>
> Node count: 5
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 3
>
> Sent: 2
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x100000000
>
> Mode: leader
>
> Node count: 5
>
> Proposal sizes last/min/max: -1/-1/-1
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 2
>
> Sent: 1
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x100000000
>
> Mode: follower
>
> Node count: 5
>
>
>
>
>
> After starting one session using zkCli.sh on Zoo1 node:
>
>
>
> Zoo1
>
> Zoo2
>
> Zoo3
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 1/9/23
>
> Received: 7
>
> Sent: 6
>
> Connections: 2
>
> Outstanding: 0
>
> Zxid: 0x100000001
>
> Mode: follower
>
> Node count: 5
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 4
>
> Sent: 3
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x100000001
>
> Mode: leader
>
> Node count: 5
>
> Proposal sizes last/min/max: 36/36/36
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 3
>
> Sent: 2
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x100000001
>
> Mode: follower
>
> Node count: 5
>
>
>
>
>
> *Note: We can see that Zxid is now consistent across all nodes. *
>
>
>
> I then shut down leader node zoo2. I can see ZOO3 became the Leader. But
> for some reason the ZXID is not the same between zoo1 and zoo3.
>
>
>
> Now closed the existing zkCli and started a new zkCli.sh session on the
> same node (zoo1).  The session was not established, the cli client just
> keeps retrying and created many outstanding requests on zoo1.  The only way
> to resolve now is to shut down all nodes and restart them again.
> (Currently, if the leader node goes down, our kafka cluster stops working.
> )
>
>
>
> Zoo1
>
> Zoo2
>
> Zoo3
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/2
>
> Received: 50
>
> Sent: 43
>
> Connections: 2
>
> Outstanding: 6
>
> Zxid: 0x100000001
>
> Mode: follower
>
> Node count: 5
>
> down
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 1
>
> Sent: 0
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x200000000
>
> Mode: leader
>
> Node count: 5
>
> Proposal sizes last/min/max: -1/-1/-1
>
>
>
> *Question: Why is the client not able to establish the session on Zoo1 ? *
>
>
>
>
>
> But a similar flow with zookeeper 3.4.14 works fine. Below is the detail:
>
>
>
> First initial setup:
>
>
>
> Zoo1
>
> Zoo2
>
> Zoo3
>
> Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built
> on 03/06/2019 16:18 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 1
>
> Sent: 0
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x0
>
> Mode: follower
>
> Node count: 4
>
> Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built
> on 03/06/2019 16:18 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 1
>
> Sent: 0
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x100000000
>
> Mode: leader
>
> Node count: 4
>
> Proposal sizes last/min/max: -1/-1/-1
>
> Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built
> on 03/06/2019 16:18 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 1
>
> Sent: 0
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x100000000
>
> Mode: follower
>
> Node count: 4
>
>
>
> After connecting with zkCli on ZOO1.
>
>
>
> Zoo1
>
> Zoo2
>
> Zoo3
>
> Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built
> on 03/06/2019 16:18 GMT
>
> Latency min/avg/max: 0/14/33
>
> Received: 5
>
> Sent: 4
>
> Connections: 2
>
> Outstanding: 0
>
> Zxid: 0x100000001
>
> Mode: follower
>
> Node count: 4
>
> Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built
> on 03/06/2019 16:18 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 2
>
> Sent: 1
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x100000001
>
> Mode: leader
>
> Node count: 4
>
> Proposal sizes last/min/max: 36/36/36
>
> Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built
> on 03/06/2019 16:18 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 2
>
> Sent: 1
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x100000001
>
> Mode: follower
>
> Node count: 4
>
>
>
> *Note: The zkid is now the same for all the nodes. *
>
>
>
>
>
> After shutting down leader node zoo2, I can see Zoo3 became the Leader. For
> some reason the ZXID is not same between zoo1 and zoo3 initially. Zoo3 has
> new zkid as a new epoch was created but zoo1 still has an old zkid.
>
>
>
> I closed the existing zxcli and started a new zkCli.sh session on the same
> node (zoo1).  This time session was established and the zkid was synced as
> well.
>
>
>
>
>
> Zoo1
>
> Zoo2
>
> Zoo3
>
> Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built
> on 03/06/2019 16:18 GMT
>
> Latency min/avg/max: 0/1/4
>
> Received: 8
>
> Sent: 7
>
> Connections: 2
>
> Outstanding: 0
>
> Zxid: 0x200000001
>
> Mode: follower
>
> Node count: 4
>
> down
>
>
>
> Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built
> on 03/06/2019 16:18 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 3
>
> Sent: 2
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x200000001
>
> Mode: leader
>
> Node count: 4
>
> Proposal sizes last/min/max: 36/36/36
>
>
>
>  Any help with this issue will be greatly appreciated!
>
> --
> Vik
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message