zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Unexpected behavior with Session Timeouts in Java Client
Date Sat, 23 Apr 2011 01:31:29 GMT
I wouldn't want to change current behavior but rather to augment it and
comply with Ben's dictum
that ZK report what it knows as it knows it.  Thus, it doesn't know about
expiration until it reconnects,
but it really does know that the expiration period has passed since loss of
connection.  Thus, the name
should reflect what is known.

It does seem good to make current behavior the default.

On Fri, Apr 22, 2011 at 5:10 PM, Scott Fines <scottfines@gmail.com> wrote:

> That is one option. It seems like it might complicate what is already a
> fairly subtle system of considerations, though.
>
> An alternative might be to have an option like "fireOnProbableExpiration"
> in
> the ZooKeeper instance.The default would then have to be set to false,
> which
> would preserve the current behavior, but setting it to true would provide
> an
> option for when we absolutely NEED the behavior.
>
> Scott
>
> On Fri, Apr 22, 2011 at 5:38 PM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > Ben, Everybody,
> >
> > What would you think if there were additional events such as
> > "PossibleSessionExpiration", "EstimatedSessionExpiration" and
> > "ProbableSessionExpiration"?  This event would be delivered
> > by the client at a time based on the last successful heartbeat, an
> > intermediate point or the connection loss event respectively.
> >
> > Does this sound interesting?
> >
> > On Fri, Apr 22, 2011 at 3:06 PM, Dave Wright <wrightd@gmail.com> wrote:
> >
> > > We ran into this exact scenario, and while it would have been nice to
> > > have the timer option implemented internally by ZK, we ended up
> > > implementing it externally ourself. We start a timer on the
> > > disconnected event, and when it gets "close" to the session timeout,
> > > we trigger the session lost behavior on the master.
> > > We may be without a master for a second or two, but that's OK in our
> > > case. As Ted mentioned, without a connection to ZK, there is no way to
> > > time it exactly anyway.
> > >
> > > The one advantage of having the session-lost timer running within
> > > zkclient instead of our app, is that it could track the timer from the
> > > last actual heartbeat, rather than the disconnected event. Depending
> > > on the network conditions that caused the disconnection, it may have
> > > been a while from when we actually lost connectivity to ZK to when the
> > > disconnection event triggers, so our own timer may not be super
> > > accurate. Having zkclient set a timer based on the last heartbeat, and
> > > triggering the session lost event when that timer expires would be
> > > more accurate.
> > >
> > > -Dave
> > >
> > >
> > > On Fri, Apr 22, 2011 at 10:03 AM, Ted Dunning <ted.dunning@gmail.com>
> > > wrote:
> > > > Well there are real limits about what knowledge you can have in a
> split
> > > > brain and how much coordination there can be.
> > > >
> > > > Having exactly one master in such situation is impossible.  You get
> to
> > > pick
> > > > your error scenario, however.  One option is to have one master
> almost
> > > all
> > > > the time with a failure mode of having zero acting masters a bit of
> the
> > > > time.  The other option is to have one master almost all the time
> with
> > a
> > > > failure mode that has two masters a bit of the time.  You get to pick
> > > which
> > > > one.
> > > >
> > > > As Ben stated, the philosophy of ZK is to report facts that can be
> > > > demonstrated.  Your application will work pretty well with a timer
> even
> > > > though that could result in momentary double master situations.  Of
> > > course,
> > > > it can also result in periods of zero master as well since a master
> cut
> > > off
> > > > from ZK may well be cut off from the clients who want to be served.
> > > >
> > > > So the API isn't making a promise it can't keep.  It is promising to
> > > report
> > > > to you as soon as it is certain of things.  And it does.
> > > >
> > > > On Fri, Apr 22, 2011 at 6:51 AM, Scott Fines <scottfines@gmail.com>
> > > wrote:
> > > >
> > > >> I guess my objection would be that the API is making a promise that
> it
> > > can
> > > >> only deliver part of the time. If the client can't reconnect to
> > > ZooKeeper,
> > > >> then the client hasn't expired, which is an unusual state to find
> > > oneself
> > > >> in, and in leader-election systems like mine could result in having
> > two
> > > >> practical leaders, while ZooKeeper is insisting that there is only
> > one.
> > > >> This
> > > >> kind of split-brain scenario seems unavoidable in the absence of
> > > >> probabilistic failure checking (like timeouts).
> > > >>
> > > >> The FAQ, I've noticed, does make mention of this phenomenon. Perhaps
> > > >> something should be indicated there regarding the why and not just
> the
> > > >> mechanics. Otherwise, developers such as myself might find
> themselves
> > > >> unduly
> > > >> confused by it :)
> > > >>
> > > >> Thanks for all your help,
> > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message