zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitalii Tymchyshyn <tiv...@gmail.com>
Subject Re: leader election, scheduled tasks, losing leadership
Date Tue, 11 Dec 2012 20:09:10 GMT
I am asking because you have this "at most once" vs "at least one" problem.
I don't think you can have "exactly one" unless your jobs are transactional
and you can synhronize your transaction commits to zookeeper (and better
with two-phase commit). So, you need to decide

What I'd recommend  to you is to make queue-like architecture, not
lock-based. This way you can:
a) Do parallel task processing
b) Try increasing timeouts to be larger than maximum task time.
    E.g. set it to one hour. This would mean that task running will restart
in an hour if client fails.

But this would mean moving from database to zookeeper for task
status/queueing. As for me this would be good as database is SPOF for you.

Best regards, Vitalii Tymchyshyn


2012/12/10 Eric Pederson <ericacm@gmail.com>

> It depends on the scheduled task.  Some have status fields in the database
> that indicate new/in-progress/done, but others do not.
>
>
> -- Eric
>
>
>
> On Mon, Dec 10, 2012 at 1:49 AM, Vitalii Tymchyshyn <tivv00@gmail.com
> >wrote:
>
> > How are you going to ensure atomicity? I mean, if you processor dies in
> the
> > middle of the operation, how do you know if it is done or not?
> >
> > --
> > Best regards,
> > Vitalii Tymchyshyn
> > 10 груд. 2012 00:11, "Eric Pederson" <ericacm@gmail.com> напис.
> >
> > > Also sometimes the app leadership (via LeaderLatch) will get lost - I
> > will
> > > follow up about this on the Curator list:
> > > https://gist.github.com/4247226
> > >
> > > So back to my previous question, what is the best way to implement the
> > > "fence"?
> > >
> > > -- Eric
> > >
> > >
> > >
> > > On Sun, Dec 9, 2012 at 4:42 PM, Eric Pederson <ericacm@gmail.com>
> wrote:
> > >
> > > > The irony is that I am using leader election to convert
> non-idempotent
> > > > operations into idempotent operations :)   For example, once a night
> a
> > > > report is emailed out to a set of addresses.   We don't want the
> report
> > > to
> > > > go to the same person more than once.
> > > >
> > > > Prior to using leader election one of the cluster members was
> > designated
> > > > as the scheduled task "leader" using a system property.  But if that
> > > > cluster member crashed it required a manual operation to failover the
> > > > "leader" responsibility to another cluster member.   I considered
> using
> > > > app-specific techniques to make the scheduled tasks idempotent (for
> > > example
> > > > using "select for update" / database locking) but I wanted a general
> > > > solution and I needed clustering support for other reasons (cluster
> > > > membership, etc).
> > > >
> > > > Anyway, here is the code that I'm using.
> > > >
> > > > Application startup (using Curator LeaderLatch):
> > > > https://gist.github.com/3936162
> > > > https://gist.github.com/3935895
> > > > https://gist.github.com/3935889
> > > >
> > > > ClusterStatus:
> > > > https://gist.github.com/3943149
> > > > https://gist.github.com/3935861
> > > >
> > > > Scheduled task:
> > > > https://gist.github.com/4246388
> > > >
> > > > In the last gist the "distribute" scheduled task is run every 30
> > seconds.
> > > >   It checks clusterStatus.isLeader to see if the current cluster
> member
> > > is
> > > > the leader before running the real method (which sends email).
> > > > clusterStatus() calls methods on LeaderLatch.
> > > >
> > > > Here is the output that I am seeing if I kill the ZK quorum leader
> and
> > > the
> > > > app cluster member that was the leader loses its LeaderLatch
> leadership
> > > to
> > > > another cluster member:
> > > > https://gist.github.com/4247058
> > > >
> > > >
> > > > -- Eric
> > > >
> > > >
> > > >
> > > > On Sun, Dec 9, 2012 at 12:30 AM, Henry Robinson <henry@cloudera.com
> > > >wrote:
> > > >
> > > >> On 8 December 2012 21:18, Jordan Zimmerman <
> > jordan@jordanzimmerman.com
> > > >> >wrote:
> > > >>
> > > >> > If your ConnectionStateListener gets SUSPENDED or LOST you've
lost
> > > >> > connection to ZooKeeper. Therefore you cannot use that same
> > ZooKeeper
> > > >> > connection to manage a node that denotes the process is running
or
> > > not.
> > > >> > Only 1 VM at a time will be running the process. That process
can
> > > watch
> > > >> for
> > > >> > SUSPENDED/LOST and wind down the task.
> > > >> >
> > > >> >
> > > >> My point is that by the time that VM sees SUSPENDED/LOST, another
VM
> > may
> > > >> have been elected leader and have started running another process.
> > > >>
> > > >> It's a classic problem - you need some mechanism to fence a node
> that
> > > >> thinks its the leader, but isn't and hasn't got the memo yet. The
> way
> > > >> around the problem is to either ensure that no work is done by you
> > once
> > > >> you
> > > >> are no longer the leader (perhaps by checking every time you want
to
> > do
> > > >> work), or that the work you do does not affect the system (e.g. by
> > > >> idempotent work units).
> > > >>
> > > >> ZK itself solves this internally by checking with that it has a
> quorum
> > > for
> > > >> every operation, which forces an ordering between the disconnection
> > > event
> > > >> and trying to do something that relies upon being the leader. Other
> > > >> systems
> > > >> forcibly terminate old leaders before allowing a new leader to take
> > the
> > > >> throne.
> > > >>
> > > >> Henry
> > > >>
> > > >>
> > > >> > > You can't assume that the notification is received locally
> before
> > > >> another
> > > >> > > leader election finishes elsewhere
> > > >> > Which notification? The ConnectionStateListener is an abstraction
> on
> > > >> > ZooKeeper's watcher mechanism. It's only significant for the
VM
> that
> > > is
> > > >> the
> > > >> > leader. Non-leaders don't need to be concerned.
> > > >>
> > > >>
> > > >> > -JZ
> > > >> >
> > > >> > On Dec 8, 2012, at 9:12 PM, Henry Robinson <henry@cloudera.com>
> > > wrote:
> > > >> >
> > > >> > > You can't assume that the notification is received locally
> before
> > > >> another
> > > >> > > leader election finishes elsewhere (particularly if you
are
> > running
> > > >> > slowly
> > > >> > > for some reason!), so it's not sufficient to guarantee that
the
> > > >> process
> > > >> > > that is running locally has finished before someone else
starts
> > > >> another.
> > > >> > >
> > > >> > > It's usually best - if possible - to restructure the system
so
> > that
> > > >> > > processes are idempotent to work around these kinds of problem,
> in
> > > >> > > conjunction with using the kind of primitives that Curator
> > provides.
> > > >> > >
> > > >> > > Henry
> > > >> > >
> > > >> > > On 8 December 2012 21:04, Jordan Zimmerman <
> > > >> jordan@jordanzimmerman.com
> > > >> > >wrote:
> > > >> > >
> > > >> > >> This is why you need a ConnectionStateListener. You'll
get a
> > notice
> > > >> that
> > > >> > >> the connection has been suspended and you should assume
all
> > > >> > locks/leaders
> > > >> > >> are invalid.
> > > >> > >>
> > > >> > >> -JZ
> > > >> > >>
> > > >> > >> On Dec 8, 2012, at 9:02 PM, Henry Robinson <henry@cloudera.com
> >
> > > >> wrote:
> > > >> > >>
> > > >> > >>> What about a network disconnection? Presumably leadership
is
> > > revoked
> > > >> > when
> > > >> > >>> the leader appears to have failed, which can be
for more
> reasons
> > > >> than a
> > > >> > >> VM
> > > >> > >>> crash (VM running slow, network event, GC pause
etc).
> > > >> > >>>
> > > >> > >>> Henry
> > > >> > >>>
> > > >> > >>> On 8 December 2012 21:00, Jordan Zimmerman <
> > > >> jordan@jordanzimmerman.com
> > > >> > >>> wrote:
> > > >> > >>>
> > > >> > >>>> The leader latch lock is the equivalent of task
in progress.
> I
> > > >> assume
> > > >> > >> the
> > > >> > >>>> task is running in the same VM as the leader
lock. The only
> > > reason
> > > >> the
> > > >> > >> VM
> > > >> > >>>> would lose leadership is if it crashes in which
case the
> > process
> > > >> would
> > > >> > >> die
> > > >> > >>>> anyway.
> > > >> > >>>>
> > > >> > >>>> -JZ
> > > >> > >>>>
> > > >> > >>>> On Dec 8, 2012, at 8:56 PM, Eric Pederson <ericacm@gmail.com
> >
> > > >> wrote:
> > > >> > >>>>
> > > >> > >>>>> If I recall correctly it was Henry Robinson
that gave me the
> > > >> advice
> > > >> > to
> > > >> > >>>> have
> > > >> > >>>>> a "task in progress" check.
> > > >> > >>>>>
> > > >> > >>>>>
> > > >> > >>>>> -- Eric
> > > >> > >>>>>
> > > >> > >>>>>
> > > >> > >>>>>
> > > >> > >>>>> On Sat, Dec 8, 2012 at 11:54 PM, Eric Pederson
<
> > > ericacm@gmail.com
> > > >> >
> > > >> > >>>> wrote:
> > > >> > >>>>>
> > > >> > >>>>>> I am using Curator LeaderLatch :)
> > > >> > >>>>>>
> > > >> > >>>>>>
> > > >> > >>>>>> -- Eric
> > > >> > >>>>>>
> > > >> > >>>>>>
> > > >> > >>>>>>
> > > >> > >>>>>>
> > > >> > >>>>>> On Sat, Dec 8, 2012 at 11:52 PM, Jordan
Zimmerman <
> > > >> > >>>>>> jordan@jordanzimmerman.com> wrote:
> > > >> > >>>>>>
> > > >> > >>>>>>> You might check your leader implementation.
Writing a
> > correct
> > > >> > leader
> > > >> > >>>>>>> recipe is actually quite challenging
due to edge cases.
> > Have a
> > > >> look
> > > >> > >> at
> > > >> > >>>>>>> Curator (disclosure: I wrote it)
for an example.
> > > >> > >>>>>>>
> > > >> > >>>>>>> -JZ
> > > >> > >>>>>>>
> > > >> > >>>>>>> On Dec 8, 2012, at 8:49 PM, Eric
Pederson <
> > ericacm@gmail.com>
> > > >> > wrote:
> > > >> > >>>>>>>
> > > >> > >>>>>>>> Actually I had the same thought
and didn't consider
> having
> > to
> > > >> do
> > > >> > >> this
> > > >> > >>>>>>> until
> > > >> > >>>>>>>> I talked about my project at
a Zookeeper User Group a
> month
> > > or
> > > >> so
> > > >> > >> ago
> > > >> > >>>>>>> and I
> > > >> > >>>>>>>> was given this advice.
> > > >> > >>>>>>>>
> > > >> > >>>>>>>> I know that I do see leadership
being lost/transferred
> when
> > > >> one of
> > > >> > >> the
> > > >> > >>>>>>> ZK
> > > >> > >>>>>>>> servers is restarted (not the
whole ensemble).   And it
> > seems
> > > >> like
> > > >> > >>>> I've
> > > >> > >>>>>>>> seen it happen even when the
ensemble stays totally
> stable
> > > >> > (though I
> > > >> > >>>> am
> > > >> > >>>>>>> not
> > > >> > >>>>>>>> 100% sure as it's been a while
since I have worked on
> this
> > > >> > >> particular
> > > >> > >>>>>>>> application).
> > > >> > >>>>>>>>
> > > >> > >>>>>>>>
> > > >> > >>>>>>>>
> > > >> > >>>>>>>> -- Eric
> > > >> > >>>>>>>>
> > > >> > >>>>>>>>
> > > >> > >>>>>>>>
> > > >> > >>>>>>>> On Sat, Dec 8, 2012 at 11:25
PM, Jordan Zimmerman <
> > > >> > >>>>>>>> jordan@jordanzimmerman.com>
wrote:
> > > >> > >>>>>>>>
> > > >> > >>>>>>>>> Why would it lose leadership?
The only reason I can
> think
> > of
> > > >> is
> > > >> > if
> > > >> > >>>> the
> > > >> > >>>>>>> ZK
> > > >> > >>>>>>>>> cluster goes down. In normal
use, the ZK cluster won't
> go
> > > >> down (I
> > > >> > >>>>>>> assume
> > > >> > >>>>>>>>> you're running 3 or 5 instances).
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>> -JZ
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>> On Dec 8, 2012, at 8:17
PM, Eric Pederson <
> > > ericacm@gmail.com>
> > > >> > >> wrote:
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>>> During the time the
task is running a cluster member
> > could
> > > >> lose
> > > >> > >> its
> > > >> > >>>>>>>>>> leadership.
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>
> > > >> > >>>>>>>
> > > >> > >>>>>>
> > > >> > >>>>
> > > >> > >>>>
> > > >> > >>>
> > > >> > >>>
> > > >> > >>> --
> > > >> > >>> Henry Robinson
> > > >> > >>> Software Engineer
> > > >> > >>> Cloudera
> > > >> > >>> 415-994-6679
> > > >> > >>
> > > >> > >>
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Henry Robinson
> > > >> > > Software Engineer
> > > >> > > Cloudera
> > > >> > > 415-994-6679
> > > >> >
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> Henry Robinson
> > > >> Software Engineer
> > > >> Cloudera
> > > >> 415-994-6679
> > > >>
> > > >
> > > >
> > >
> >
>



-- 
Best regards,
 Vitalii Tymchyshyn

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message