zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@gmail.com>
Subject Re: Changing leader to follower?
Date Mon, 13 Oct 2014 16:11:28 GMT
P.S. that was 1 second in a cluster. over WAN I believe the benefit of the
current simple implementation will be bigger.

On Mon, Oct 13, 2014 at 9:06 AM, Alexander Shraer <shralex@gmail.com> wrote:

> I agree that such a feature could be very useful.
>
> >one could announce to the other nodes that the leader is retiring, so
> there’s no need to wait for failed heartbeat responses to realize that the
> leader is no longer serving.
>
> This is actually what happens when the leader steps down during a reconfig
> operation (such as when changing the leading port, removing the leader or
> making it an observer), so it should be possible to add an explicit command
> to trigger this mechanism as you suggest, if someone wants to take on this
> implementation.
>
> It saved about 1 second in my experiments (which is probably the timeout
> you mention and a few rounds of fast leader election) , but can still be
> optimized further. For example, for simplicity I still go back to leader
> election, with an initial vote indicating who the new designated leader
> should be, so even though leader election terminates after one round it
> is not completely avoided as it could be.
>
>
>
>
>
> On Mon, Oct 13, 2014 at 8:25 AM, Jeff Potter <
> jpotter-zookeeper@codepuppy.com> wrote:
>
>>
>> We’re using zookeeper cross-DC to coordinate communication of data that’s
>> served to our iOS app via HTTP API calls — in this case, the hosts that the
>> app should be connecting to for chat. Chat nodes get added into the
>> cluster, register themselves in zookeeper; meanwhile, clients issue API
>> calls to web servers that return a list of chat nodes that the client
>> should be connecting to. There’s a little bit of other global settings that
>> we also coordinate via zookeeper, but that stuff could, in theory, be
>> manually applied to each of the DCs, since changes to it are manual. (We
>> also run cassandra cross-DC, so we already have dependencies on talking
>> cross-DC; hence two main DCs and a tie-breaker third DC that also serves as
>> a back-up DC.)
>>
>> I’ve seen SmartStack before, and it seems like a good potential solution
>> at larger scales, but in our current size / capacity, registering directly
>> on top of zookeeper is lightweight and simple enough. I haven’t seen the
>> Pinterst writeup; thanks for sending it!
>>
>> You’d asked about frequency of leader elections. We don’t see leader
>> elections happening that often — the only time they come up is when we do
>> something to take down the current leader, which is very, very rare — our
>> deploys don’t need to restart that service. So far, the only time it’s
>> happened in a year+ is the XEN-108 bug that caused the node to reboot.
>>
>> To be clear, we’re “okay” with the leader re-election time; I’m just
>> surprised that it’s as choppy as it is and we were surprised looking
>> through the “service zookeeper stop” target as to how it was implemented. I
>> would think there’d be some benefit to having a leader “step down”, in that
>> one could announce to the other nodes that the leader is retiring, so
>> there’s no need to wait for failed heartbeat responses to realize that the
>> leader is no longer serving.
>>
>> -Jeff
>>
>>
>> On Oct 11, 2014, at 2:09 PM, ralph tice <ralph.tice@gmail.com> wrote:
>>
>> > I'm not an expert but I don't think there is a magic bullet here, leader
>> > election has to happen in this circumstance and that takes time.
>> >
>> > You may be better served by building better resilience to eliminate
>> > ZooKeeper's uptime from being a single point of failure in your services
>> > layer.  Pinterest and Airbnb both have some prior art here,
>> >
>> http://engineering.pinterest.com/post/77933733851/zookeeper-resilience-at-pinterest
>> > and http://nerds.airbnb.com/smartstack-service-discovery-cloud/
>> >
>> > I'm curious why you chose a cross-DC ensemble versus localized
>> same-region
>> > ensembles.  Don't you deal with a significant frequency of leader
>> elections
>> > from being in 3 regions anyway?
>> >
>> >
>> > On Sat, Oct 11, 2014 at 11:21 AM, Jeff Potter <
>> > jpotter-zookeeper@codepuppy.com> wrote:
>> >
>> >>
>> >> The reason I ask is that we’ve noticed, when running zookeeper
>> cross-DC,
>> >> that restarting the node that’s currently the leader causes a brief but
>> >> real service interruption for 3 to 5 seconds while the rest of the
>> cluster
>> >> elects a new leader and syncs. We’re on AWS, with 2 ZK nodes in
>> US-East, 2
>> >> in US-West-2, and 1 in US-West (as a tie-breaker).
>> >>
>> >> It would seem taking a leader to follower status would be useful; and
>> >> doing so without it actually being a stop / disconnect on all clients
>> >> connect to the node. (Especially for doing rolling restarts of all
>> nodes,
>> >> e.g. XEN-108 bug.)
>> >>
>> >> -Jeff
>> >>
>> >>
>> >>
>> >> On Oct 10, 2014, at 10:16 AM, Ivan Kelly <ivank@apache.org> wrote:
>> >>
>> >>> Or just pause the process until someone else takes over.
>> >>>
>> >>> 1. kill -STOP <zookeeper_pid>
>> >>> 2. // wait for election to happen
>> >>> 3. kill -CONT <zookeeper_pid>
>> >>>
>> >>> This wont top it from becoming leader again. Also, client may migrate
>> to
>> >>> other servers.
>> >>>
>> >>> -Ivan
>> >>>
>> >>> Alexander Shraer writes:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> I don't think there's a direct way, although this seems a useful
>> thing
>> >> to
>> >>>> add.
>> >>>>
>> >>>> One think you could do is to issue a reconfig changing the leader's
>> >>>> leading/quorum port (through which
>> >>>> it talks with the followers). This will cause it to give up
>> leadership
>> >>>> while keeping it in the cluster.
>> >>>>
>> >>>> Cheers,
>> >>>> Alex
>> >>>>
>> >>>> On Fri, Oct 10, 2014 at 5:57 AM, Jeff Potter <
>> >>>> jpotter-zookeeper@codepuppy.com> wrote:
>> >>>>
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> Is there a way to “retire” a leader while keeping it in
the cluster?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Jeff
>> >>
>> >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message