zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Zookeeper quorum goes down for no apparent reason in 3.4.5
Date Thu, 05 Oct 2017 16:50:43 GMT
Unfortunately I don't see any attached logs, which makes it difficult to
provide you with insight. "Not sufficient followers synced" indicates that
you're losing followers, likely they are falling behind - what is your
metric tracking telling your wrt load on the compute and load on the
disk/memory/network/etc... also metrics at the ZK level (e.g. are zk
latencies increasing?) Check the logs to see if you're seeing "fsync"
slowness issues (it's a warning in the server logs). This is a pretty
common issue. GC might also be an issue, although that's more rare these
days (hard to say w/o knowing your use case, etc...) Again, look to your
metrics collection for insight where to start.


On Wed, Oct 4, 2017 at 11:17 AM, Anand Parthasarathy <
anpartha@avinetworks.com> wrote:

> Hi,
> We have an issue with a 3-node zookeeper ensemble where the quorum goes
> down due to no apparent reason every once in a while. Here is what I see in
> the ZK leader:
> 2017-09-21 03:00:03,648 [myid:3] - INFO  [QuorumPeer[myid=3]/
> 5002:Leader@493] - Shutting down
> 2017-09-21 03:00:03,648 [myid:3] - INFO  [QuorumPeer[myid=3]/
> 5002:Leader@499] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Not sufficient followers
> synced, only synced with sids: [ 3 ]
>     at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:499)
>     at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:474)
>     at org.apache.zookeeper.server.quorum.QuorumPeer.run(
> QuorumPeer.java:799)
> I have attached the logs from the 3 nodes around this time. Could you pls.
> help understand what the issue could be here. The only thing I see a little
> bit ahead of this timestamp is that all of them did a PurgeTask pretty much
> at the same time.
> Thanks,
> Anand.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message