170MB isn't that large. I'm thinking that the election might be taking some time because of
the issue described in ZK-2164. In that jira, the election was failing, but I can see the
blocking call causing the election to take longer as well. You may want to test that fix and
see if it makes any difference to you.
-Flavio
> On 29 Apr 2015, at 19:05, Karol Dudzinski <karoldudzinski@gmail.com> wrote:
>
> The snapshots are a bit larger than I'd like (~170MB) but the bit that takes time is
actually before that.
>
> On a successful election, on the new leader I see the following:
> - LEADER ELECTION TOOK: 12345
> - Synchronizing with Follower...
> - Sending DIFF or SNAP
>
> That last bit will take a long time if the snapshot is large, agreed. However, for me
it's arriving at the LEADER ELECTION TOOK bit that takes a long time. And the bit between
that and the Synchronizing often times out with the messages I described in my previous email.
>
> Karol
>
>> On 29 Apr 2015, at 18:46, Flavio Junqueira <fpjunqueira@yahoo.com.INVALID>
wrote:
>>
>> Check the size of your snapshot files. If during election you need to transfer snapshots
and they are large, then recovery is going to take longer. We try to send a diff of the txns
when possible, and sending a diff is much faster compared to send a full snapshot. Sending
a diff isn't always possible, depending on how far behind followers compared to the leader,
so the difference might be due to snapshots and diffs.
>> -Flavio
>>
>>
>> On Wednesday, April 29, 2015 6:32 PM, Camille Fournier <camille@apache.org>
wrote:
>>
>>
>>
>> Don't suppose you could share some snippets of the log messages you're
>> seeing a lot?
>>
>> On Wed, Apr 29, 2015 at 1:29 PM, Karol Dudzinski <karoldudzinski@gmail.com>
>> wrote:
>>
>>> Did a bit more investigation today though I didn't try the prod snapshot
>>> in a test cluster. I did however find a couple of other strange things
>>> which I'd initially missed.
>>>
>>> When the leader of this particular cluster goes for a reboot, I see many
>>> notification printouts from FastLeaderElection. On some occasions I've
>>> seen nearly 50. In other clusters I usually see 5 or thereabouts. The
>>> number of these notifications seems to correlate with how long election
>>> took. What could cause it to require exchanging so many messages?
>>>
>>> This cluster also seems to often timeout while syncing after selecting the
>>> leader. On the new leader I see the expected number of "Follower sid: ..."
>>> messages but it doesn't always get to the "Synchronizing with Follower"
>>> part. On the leader I then get a "Unexpected exception causing shutdown
>>> while sock still open" message followed by a GOODBYE for one of the
>>> followers. At the same time on that follower I see a "Exception when
>>> following the leader" caused by a read timeout in
>>> Learner.registerWithLeader. Does anything happen prior to syncing that
>>> could take a long time?
>>>
>>> Thanks,
>>> Karol
>>>
>>>>> On 28 Apr 2015, at 20:22, Karol Dudzinski <karoldudzinski@gmail.com>
>>>> wrote:
>>>>
>>>> Well these are prod clusters so my ability to experiment is rather
>>> limited. I can take a copy of the snapshot and try both 3 node and 5 in a
>>> test cluster.
>>>>
>>>> One thing I forgot to mention is that in most clusters the number of
>>> election notification log lines I see is typically, give or take, the same
>>> as the number of participants. In this cluster however, it's typically 2
>>> or 3 times as many notifications as the number of participants.
>>>>
>>>> My gut feeling is it's more likely to be due to load as the 5 node
>>> cluster is much busier and the election time has been increasing over time
>>> (as has load). I have no idea exactly what load though, whether it's
>>> number of clients, frequency of transactions, total data size, etc. I
>>> don't understand why though but that may just be my limited knowledge of
>>> the election protocol.
>>>>
>>>> Karol
>>>>
>>>>> On 28 Apr 2015, at 19:54, Camille Fournier <camille@apache.org>
wrote:
>>>>>
>>>>> Just out of curiosity, if you start the 5 node cluster up with only 3
of
>>>>> the nodes to begin with (like, config 5, but only bring up 3 processes),
>>>>> does it speed up the leader election or is it still slow?
>>>>>
>>>>> C
>>>>>
>>>>> On Tue, Apr 28, 2015 at 1:41 PM, Karol Dudzinski <
>>> karoldudzinski@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We're seeing some rather strange leader election in one of our
>>> clusters.
>>>>>> The duration reported by the "FOLLOWING - LEADER ELECTION TOOK" log
>>> line
>>>>>> (and equivalent for the leader) seems to vary hugely. During one
>>> rolling
>>>>>> reboot, I saw the number reported as small as 39ms and as large as
57
>>>>>> seconds (difference in units is not a typo). The average is just
>>> about 10
>>>>>> seconds and std dev also about 10 seconds. So the time taken is
not
>>> only
>>>>>> quite large, it's also very variable.
>>>>>>
>>>>>> We have other clusters but the average election time in those is
in the
>>>>>> hundreds of millis with std dev in a similar ballpark. I guess one
>>>>>> difference is the "slow" cluster is 5 participants while the others
>>> are 3,
>>>>>> which may be a factor but I wouldn't expect it to make two orders
of
>>>>>> magnitude difference!
>>>>>>
>>>>>> So my question is, what factors contribute to the election time
>>> reported
>>>>>> by these log lines? And what can we do to speed this up?
>>>>>>
>>>>>> As far as I understand from logs and a quick browse through the code
>>> that
>>>>>> time is the time to select a leader. Syncing up to the leader happens
>>>>>> after that. The syncing part I can understand will vary depending
on
>>> load
>>>>>> but I don't see why selecting the leader would.
>>>>>>
>>>>>> Thanks,
>>>>>> Karol
>>
>>
>>
|