zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: One node crashing in 3.4.11 triggered a full ensemble restart
Date Wed, 02 Oct 2019 19:29:02 GMT
Have you tried to stop the node, delete the data and log directory, upgrade to 3.5.5 , start
the node and wait until it is synchronized ?

> Am 02.10.2019 um 20:14 schrieb Jerry Hebert <jerry.hebert@gmail.com>:
> 
> Hi all,
> 
> My first post here! I'm hoping you all might be able to offer some guidance
> or redirect me to an existing ticket. We have a five node ensemble on
> 3.4.11 that we're currently in the process of upgrading to 3.5.5. We
> recently saw some bizarre behavior in our ensemble that I was hoping to
> find some sort pre-existing ticket or discussion about but I was having
> difficulty finding hits for this in Jira.
> 
> The behavior that we saw from our metrics is that one of our nodes (not
> sure if it was a follower or a leader) started to demonstrate
> instability (high CPU, high RAM) and it crashed. Not a big deal, but as
> soon as it crashed, all of the other four nodes all immediately restarted,
> resulting in a short outage. One node crashing should never cause an
> ensemble restart of course, so I assumed that this must be a bug in ZK. The
> nodes that restarted had no indication of errors in their logs, they just
> simply restarted. Does this sound familiar to any of you?
> 
> Also, we are using Exhibitor on that ensemble so it's also possible that
> the restart was caused by Exhibitor.
> 
> My hope is that this issue will be behind us once the 3.5.5 upgrade is
> complete but I'd ideally like to find some concrete evidence of this.
> 
> Thanks!
> Jerry

Mime
View raw message