zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camille Fournier <cami...@apache.org>
Subject Re: Leader election duration
Date Wed, 29 Apr 2015 17:32:01 GMT
Don't suppose you could share some snippets of the log messages you're
seeing a lot?

On Wed, Apr 29, 2015 at 1:29 PM, Karol Dudzinski <karoldudzinski@gmail.com>
wrote:

> Did a bit more investigation today though I didn't try the prod snapshot
> in a test cluster.  I did however find a couple of other strange things
> which I'd initially missed.
>
> When the leader of this particular cluster goes for a reboot, I see many
> notification printouts from FastLeaderElection.  On some occasions I've
> seen nearly 50.  In other clusters I usually see 5 or thereabouts.  The
> number of these notifications seems to correlate with how long election
> took.  What could cause it to require exchanging so many messages?
>
> This cluster also seems to often timeout while syncing after selecting the
> leader.  On the new leader I see the expected number of "Follower sid: ..."
> messages but it doesn't always get to the "Synchronizing with Follower"
> part.  On the leader I then get a "Unexpected exception causing shutdown
> while sock still open" message followed by a GOODBYE for one of the
> followers.  At the same time on that follower I see a "Exception when
> following the leader" caused by a read timeout in
> Learner.registerWithLeader.  Does anything happen prior to syncing that
> could take a long time?
>
> Thanks,
> Karol
>
> > On 28 Apr 2015, at 20:22, Karol Dudzinski <karoldudzinski@gmail.com>
> wrote:
> >
> > Well these are prod clusters so my ability to experiment is rather
> limited.  I can take a copy of the snapshot and try both 3 node and 5 in a
> test cluster.
> >
> > One thing I forgot to mention is that in most clusters the number of
> election notification log lines I see is typically, give or take, the same
> as the number of participants.  In this cluster however, it's typically 2
> or 3 times as many notifications as the number of participants.
> >
> > My gut feeling is it's more likely to be due to load as the 5 node
> cluster is much busier and the election time has been increasing over time
> (as has load).  I have no idea exactly what load though, whether it's
> number of clients, frequency of transactions, total data size, etc.  I
> don't understand why though but that may just be my limited knowledge of
> the election protocol.
> >
> > Karol
> >
> >> On 28 Apr 2015, at 19:54, Camille Fournier <camille@apache.org> wrote:
> >>
> >> Just out of curiosity, if you start the 5 node cluster up with only 3 of
> >> the nodes to begin with (like, config 5, but only bring up 3 processes),
> >> does it speed up the leader election or is it still slow?
> >>
> >> C
> >>
> >> On Tue, Apr 28, 2015 at 1:41 PM, Karol Dudzinski <
> karoldudzinski@gmail.com>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> We're seeing some rather strange leader election in one of our
> clusters.
> >>> The duration reported by the "FOLLOWING - LEADER ELECTION TOOK" log
> line
> >>> (and equivalent for the leader) seems to vary hugely.  During one
> rolling
> >>> reboot, I saw the number reported as small as 39ms and as large as 57
> >>> seconds (difference in units is not a typo).  The average is just
> about 10
> >>> seconds and std dev also about 10 seconds.  So the time taken is not
> only
> >>> quite large, it's also very variable.
> >>>
> >>> We have other clusters but the average election time in those is in the
> >>> hundreds of millis with std dev in a similar ballpark.  I guess one
> >>> difference is the "slow" cluster is 5 participants while the others
> are 3,
> >>> which may be a factor but I wouldn't expect it to make two orders of
> >>> magnitude difference!
> >>>
> >>> So my question is, what factors contribute to the election time
> reported
> >>> by these log lines? And what can we do to speed this up?
> >>>
> >>> As far as I understand from logs and a quick browse through the code
> that
> >>> time is the time to select a leader.  Syncing up to the leader happens
> >>> after that.  The syncing part I can understand will vary depending on
> load
> >>> but I don't see why selecting the leader would.
> >>>
> >>> Thanks,
> >>> Karol
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message