zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <fpjunque...@yahoo.com.INVALID>
Subject Re: Leader election duration
Date Wed, 29 Apr 2015 17:46:37 GMT
Check the size of your snapshot files. If during election you need to transfer snapshots and
they are large, then recovery is going to take longer. We try to send a diff of the txns when
possible, and sending a diff is much faster compared to send a full snapshot. Sending a diff
isn't always possible, depending on how far behind followers compared to the leader, so the
difference might be due to snapshots and diffs.
-Flavio  


     On Wednesday, April 29, 2015 6:32 PM, Camille Fournier <camille@apache.org> wrote:
   
 

 Don't suppose you could share some snippets of the log messages you're
seeing a lot?

On Wed, Apr 29, 2015 at 1:29 PM, Karol Dudzinski <karoldudzinski@gmail.com>
wrote:

> Did a bit more investigation today though I didn't try the prod snapshot
> in a test cluster.  I did however find a couple of other strange things
> which I'd initially missed.
>
> When the leader of this particular cluster goes for a reboot, I see many
> notification printouts from FastLeaderElection.  On some occasions I've
> seen nearly 50.  In other clusters I usually see 5 or thereabouts.  The
> number of these notifications seems to correlate with how long election
> took.  What could cause it to require exchanging so many messages?
>
> This cluster also seems to often timeout while syncing after selecting the
> leader.  On the new leader I see the expected number of "Follower sid: ..."
> messages but it doesn't always get to the "Synchronizing with Follower"
> part.  On the leader I then get a "Unexpected exception causing shutdown
> while sock still open" message followed by a GOODBYE for one of the
> followers.  At the same time on that follower I see a "Exception when
> following the leader" caused by a read timeout in
> Learner.registerWithLeader.  Does anything happen prior to syncing that
> could take a long time?
>
> Thanks,
> Karol
>
> > On 28 Apr 2015, at 20:22, Karol Dudzinski <karoldudzinski@gmail.com>
> wrote:
> >
> > Well these are prod clusters so my ability to experiment is rather
> limited.  I can take a copy of the snapshot and try both 3 node and 5 in a
> test cluster.
> >
> > One thing I forgot to mention is that in most clusters the number of
> election notification log lines I see is typically, give or take, the same
> as the number of participants.  In this cluster however, it's typically 2
> or 3 times as many notifications as the number of participants.
> >
> > My gut feeling is it's more likely to be due to load as the 5 node
> cluster is much busier and the election time has been increasing over time
> (as has load).  I have no idea exactly what load though, whether it's
> number of clients, frequency of transactions, total data size, etc.  I
> don't understand why though but that may just be my limited knowledge of
> the election protocol.
> >
> > Karol
> >
> >> On 28 Apr 2015, at 19:54, Camille Fournier <camille@apache.org> wrote:
> >>
> >> Just out of curiosity, if you start the 5 node cluster up with only 3 of
> >> the nodes to begin with (like, config 5, but only bring up 3 processes),
> >> does it speed up the leader election or is it still slow?
> >>
> >> C
> >>
> >> On Tue, Apr 28, 2015 at 1:41 PM, Karol Dudzinski <
> karoldudzinski@gmail.com>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> We're seeing some rather strange leader election in one of our
> clusters.
> >>> The duration reported by the "FOLLOWING - LEADER ELECTION TOOK" log
> line
> >>> (and equivalent for the leader) seems to vary hugely.  During one
> rolling
> >>> reboot, I saw the number reported as small as 39ms and as large as 57
> >>> seconds (difference in units is not a typo).  The average is just
> about 10
> >>> seconds and std dev also about 10 seconds.  So the time taken is not
> only
> >>> quite large, it's also very variable.
> >>>
> >>> We have other clusters but the average election time in those is in the
> >>> hundreds of millis with std dev in a similar ballpark.  I guess one
> >>> difference is the "slow" cluster is 5 participants while the others
> are 3,
> >>> which may be a factor but I wouldn't expect it to make two orders of
> >>> magnitude difference!
> >>>
> >>> So my question is, what factors contribute to the election time
> reported
> >>> by these log lines? And what can we do to speed this up?
> >>>
> >>> As far as I understand from logs and a quick browse through the code
> that
> >>> time is the time to select a leader.  Syncing up to the leader happens
> >>> after that.  The syncing part I can understand will vary depending on
> load
> >>> but I don't see why selecting the leader would.
> >>>
> >>> Thanks,
> >>> Karol
>


 
  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message