zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Robinson <he...@cloudera.com>
Subject Re: leader election, scheduled tasks, losing leadership
Date Sun, 09 Dec 2012 05:30:31 GMT
On 8 December 2012 21:18, Jordan Zimmerman <jordan@jordanzimmerman.com>wrote:

> If your ConnectionStateListener gets SUSPENDED or LOST you've lost
> connection to ZooKeeper. Therefore you cannot use that same ZooKeeper
> connection to manage a node that denotes the process is running or not.
> Only 1 VM at a time will be running the process. That process can watch for
> SUSPENDED/LOST and wind down the task.
>
>
My point is that by the time that VM sees SUSPENDED/LOST, another VM may
have been elected leader and have started running another process.

It's a classic problem - you need some mechanism to fence a node that
thinks its the leader, but isn't and hasn't got the memo yet. The way
around the problem is to either ensure that no work is done by you once you
are no longer the leader (perhaps by checking every time you want to do
work), or that the work you do does not affect the system (e.g. by
idempotent work units).

ZK itself solves this internally by checking with that it has a quorum for
every operation, which forces an ordering between the disconnection event
and trying to do something that relies upon being the leader. Other systems
forcibly terminate old leaders before allowing a new leader to take the
throne.

Henry


> > You can't assume that the notification is received locally before another
> > leader election finishes elsewhere
> Which notification? The ConnectionStateListener is an abstraction on
> ZooKeeper's watcher mechanism. It's only significant for the VM that is the
> leader. Non-leaders don't need to be concerned.


> -JZ
>
> On Dec 8, 2012, at 9:12 PM, Henry Robinson <henry@cloudera.com> wrote:
>
> > You can't assume that the notification is received locally before another
> > leader election finishes elsewhere (particularly if you are running
> slowly
> > for some reason!), so it's not sufficient to guarantee that the process
> > that is running locally has finished before someone else starts another.
> >
> > It's usually best - if possible - to restructure the system so that
> > processes are idempotent to work around these kinds of problem, in
> > conjunction with using the kind of primitives that Curator provides.
> >
> > Henry
> >
> > On 8 December 2012 21:04, Jordan Zimmerman <jordan@jordanzimmerman.com
> >wrote:
> >
> >> This is why you need a ConnectionStateListener. You'll get a notice that
> >> the connection has been suspended and you should assume all
> locks/leaders
> >> are invalid.
> >>
> >> -JZ
> >>
> >> On Dec 8, 2012, at 9:02 PM, Henry Robinson <henry@cloudera.com> wrote:
> >>
> >>> What about a network disconnection? Presumably leadership is revoked
> when
> >>> the leader appears to have failed, which can be for more reasons than a
> >> VM
> >>> crash (VM running slow, network event, GC pause etc).
> >>>
> >>> Henry
> >>>
> >>> On 8 December 2012 21:00, Jordan Zimmerman <jordan@jordanzimmerman.com
> >>> wrote:
> >>>
> >>>> The leader latch lock is the equivalent of task in progress. I assume
> >> the
> >>>> task is running in the same VM as the leader lock. The only reason the
> >> VM
> >>>> would lose leadership is if it crashes in which case the process would
> >> die
> >>>> anyway.
> >>>>
> >>>> -JZ
> >>>>
> >>>> On Dec 8, 2012, at 8:56 PM, Eric Pederson <ericacm@gmail.com>
wrote:
> >>>>
> >>>>> If I recall correctly it was Henry Robinson that gave me the advice
> to
> >>>> have
> >>>>> a "task in progress" check.
> >>>>>
> >>>>>
> >>>>> -- Eric
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sat, Dec 8, 2012 at 11:54 PM, Eric Pederson <ericacm@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>>> I am using Curator LeaderLatch :)
> >>>>>>
> >>>>>>
> >>>>>> -- Eric
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Dec 8, 2012 at 11:52 PM, Jordan Zimmerman <
> >>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>
> >>>>>>> You might check your leader implementation. Writing a correct
> leader
> >>>>>>> recipe is actually quite challenging due to edge cases.
Have a look
> >> at
> >>>>>>> Curator (disclosure: I wrote it) for an example.
> >>>>>>>
> >>>>>>> -JZ
> >>>>>>>
> >>>>>>> On Dec 8, 2012, at 8:49 PM, Eric Pederson <ericacm@gmail.com>
> wrote:
> >>>>>>>
> >>>>>>>> Actually I had the same thought and didn't consider
having to do
> >> this
> >>>>>>> until
> >>>>>>>> I talked about my project at a Zookeeper User Group
a month or so
> >> ago
> >>>>>>> and I
> >>>>>>>> was given this advice.
> >>>>>>>>
> >>>>>>>> I know that I do see leadership being lost/transferred
when one of
> >> the
> >>>>>>> ZK
> >>>>>>>> servers is restarted (not the whole ensemble).   And
it seems like
> >>>> I've
> >>>>>>>> seen it happen even when the ensemble stays totally
stable
> (though I
> >>>> am
> >>>>>>> not
> >>>>>>>> 100% sure as it's been a while since I have worked on
this
> >> particular
> >>>>>>>> application).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> -- Eric
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Sat, Dec 8, 2012 at 11:25 PM, Jordan Zimmerman <
> >>>>>>>> jordan@jordanzimmerman.com> wrote:
> >>>>>>>>
> >>>>>>>>> Why would it lose leadership? The only reason I
can think of is
> if
> >>>> the
> >>>>>>> ZK
> >>>>>>>>> cluster goes down. In normal use, the ZK cluster
won't go down (I
> >>>>>>> assume
> >>>>>>>>> you're running 3 or 5 instances).
> >>>>>>>>>
> >>>>>>>>> -JZ
> >>>>>>>>>
> >>>>>>>>> On Dec 8, 2012, at 8:17 PM, Eric Pederson <ericacm@gmail.com>
> >> wrote:
> >>>>>>>>>
> >>>>>>>>>> During the time the task is running a cluster
member could lose
> >> its
> >>>>>>>>>> leadership.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Henry Robinson
> >>> Software Engineer
> >>> Cloudera
> >>> 415-994-6679
> >>
> >>
> >
> >
> > --
> > Henry Robinson
> > Software Engineer
> > Cloudera
> > 415-994-6679
>
>


-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message