zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@gmail.com>
Subject Re: Changing leader to follower?
Date Fri, 17 Oct 2014 00:24:19 GMT
it would actually be interesting if you can do the experiment in your
system - try demoting the leader using reconfig (changing its leading port
or making it an observer or removing it) and measuring how much time it
saves compared to shutting it down and having some other replica take over
as leader.

On Mon, Oct 13, 2014 at 9:11 AM, Alexander Shraer <shralex@gmail.com> wrote:

> P.S. that was 1 second in a cluster. over WAN I believe the benefit of the
> current simple implementation will be bigger.
>
> On Mon, Oct 13, 2014 at 9:06 AM, Alexander Shraer <shralex@gmail.com>
> wrote:
>
>> I agree that such a feature could be very useful.
>>
>> >one could announce to the other nodes that the leader is retiring, so
>> there’s no need to wait for failed heartbeat responses to realize that the
>> leader is no longer serving.
>>
>> This is actually what happens when the leader steps down during a
>> reconfig operation (such as when changing the leading port, removing the
>> leader or making it an observer), so it should be possible to add an
>> explicit command to trigger this mechanism as you suggest, if someone wants
>> to take on this implementation.
>>
>> It saved about 1 second in my experiments (which is probably the timeout
>> you mention and a few rounds of fast leader election) , but can still be
>> optimized further. For example, for simplicity I still go back to leader
>> election, with an initial vote indicating who the new designated leader
>> should be, so even though leader election terminates after one round it
>> is not completely avoided as it could be.
>>
>>
>>
>>
>>
>> On Mon, Oct 13, 2014 at 8:25 AM, Jeff Potter <
>> jpotter-zookeeper@codepuppy.com> wrote:
>>
>>>
>>> We’re using zookeeper cross-DC to coordinate communication of data
>>> that’s served to our iOS app via HTTP API calls — in this case, the hosts
>>> that the app should be connecting to for chat. Chat nodes get added into
>>> the cluster, register themselves in zookeeper; meanwhile, clients issue API
>>> calls to web servers that return a list of chat nodes that the client
>>> should be connecting to. There’s a little bit of other global settings that
>>> we also coordinate via zookeeper, but that stuff could, in theory, be
>>> manually applied to each of the DCs, since changes to it are manual. (We
>>> also run cassandra cross-DC, so we already have dependencies on talking
>>> cross-DC; hence two main DCs and a tie-breaker third DC that also serves as
>>> a back-up DC.)
>>>
>>> I’ve seen SmartStack before, and it seems like a good potential solution
>>> at larger scales, but in our current size / capacity, registering directly
>>> on top of zookeeper is lightweight and simple enough. I haven’t seen the
>>> Pinterst writeup; thanks for sending it!
>>>
>>> You’d asked about frequency of leader elections. We don’t see leader
>>> elections happening that often — the only time they come up is when we do
>>> something to take down the current leader, which is very, very rare — our
>>> deploys don’t need to restart that service. So far, the only time it’s
>>> happened in a year+ is the XEN-108 bug that caused the node to reboot.
>>>
>>> To be clear, we’re “okay” with the leader re-election time; I’m just
>>> surprised that it’s as choppy as it is and we were surprised looking
>>> through the “service zookeeper stop” target as to how it was implemented.
I
>>> would think there’d be some benefit to having a leader “step down”, in
that
>>> one could announce to the other nodes that the leader is retiring, so
>>> there’s no need to wait for failed heartbeat responses to realize that the
>>> leader is no longer serving.
>>>
>>> -Jeff
>>>
>>>
>>> On Oct 11, 2014, at 2:09 PM, ralph tice <ralph.tice@gmail.com> wrote:
>>>
>>> > I'm not an expert but I don't think there is a magic bullet here,
>>> leader
>>> > election has to happen in this circumstance and that takes time.
>>> >
>>> > You may be better served by building better resilience to eliminate
>>> > ZooKeeper's uptime from being a single point of failure in your
>>> services
>>> > layer.  Pinterest and Airbnb both have some prior art here,
>>> >
>>> http://engineering.pinterest.com/post/77933733851/zookeeper-resilience-at-pinterest
>>> > and http://nerds.airbnb.com/smartstack-service-discovery-cloud/
>>> >
>>> > I'm curious why you chose a cross-DC ensemble versus localized
>>> same-region
>>> > ensembles.  Don't you deal with a significant frequency of leader
>>> elections
>>> > from being in 3 regions anyway?
>>> >
>>> >
>>> > On Sat, Oct 11, 2014 at 11:21 AM, Jeff Potter <
>>> > jpotter-zookeeper@codepuppy.com> wrote:
>>> >
>>> >>
>>> >> The reason I ask is that we’ve noticed, when running zookeeper
>>> cross-DC,
>>> >> that restarting the node that’s currently the leader causes a brief
>>> but
>>> >> real service interruption for 3 to 5 seconds while the rest of the
>>> cluster
>>> >> elects a new leader and syncs. We’re on AWS, with 2 ZK nodes in
>>> US-East, 2
>>> >> in US-West-2, and 1 in US-West (as a tie-breaker).
>>> >>
>>> >> It would seem taking a leader to follower status would be useful; and
>>> >> doing so without it actually being a stop / disconnect on all clients
>>> >> connect to the node. (Especially for doing rolling restarts of all
>>> nodes,
>>> >> e.g. XEN-108 bug.)
>>> >>
>>> >> -Jeff
>>> >>
>>> >>
>>> >>
>>> >> On Oct 10, 2014, at 10:16 AM, Ivan Kelly <ivank@apache.org> wrote:
>>> >>
>>> >>> Or just pause the process until someone else takes over.
>>> >>>
>>> >>> 1. kill -STOP <zookeeper_pid>
>>> >>> 2. // wait for election to happen
>>> >>> 3. kill -CONT <zookeeper_pid>
>>> >>>
>>> >>> This wont top it from becoming leader again. Also, client may
>>> migrate to
>>> >>> other servers.
>>> >>>
>>> >>> -Ivan
>>> >>>
>>> >>> Alexander Shraer writes:
>>> >>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> I don't think there's a direct way, although this seems a useful
>>> thing
>>> >> to
>>> >>>> add.
>>> >>>>
>>> >>>> One think you could do is to issue a reconfig changing the leader's
>>> >>>> leading/quorum port (through which
>>> >>>> it talks with the followers). This will cause it to give up
>>> leadership
>>> >>>> while keeping it in the cluster.
>>> >>>>
>>> >>>> Cheers,
>>> >>>> Alex
>>> >>>>
>>> >>>> On Fri, Oct 10, 2014 at 5:57 AM, Jeff Potter <
>>> >>>> jpotter-zookeeper@codepuppy.com> wrote:
>>> >>>>
>>> >>>>>
>>> >>>>> Hi,
>>> >>>>>
>>> >>>>> Is there a way to “retire” a leader while keeping it
in the
>>> cluster?
>>> >>>>>
>>> >>>>> Thanks,
>>> >>>>> Jeff
>>> >>
>>> >>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message