zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Unexpected behavior with Session Timeouts in Java Client
Date Fri, 22 Apr 2011 22:38:14 GMT
Ben, Everybody,

What would you think if there were additional events such as
"PossibleSessionExpiration", "EstimatedSessionExpiration" and
"ProbableSessionExpiration"?  This event would be delivered
by the client at a time based on the last successful heartbeat, an
intermediate point or the connection loss event respectively.

Does this sound interesting?

On Fri, Apr 22, 2011 at 3:06 PM, Dave Wright <wrightd@gmail.com> wrote:

> We ran into this exact scenario, and while it would have been nice to
> have the timer option implemented internally by ZK, we ended up
> implementing it externally ourself. We start a timer on the
> disconnected event, and when it gets "close" to the session timeout,
> we trigger the session lost behavior on the master.
> We may be without a master for a second or two, but that's OK in our
> case. As Ted mentioned, without a connection to ZK, there is no way to
> time it exactly anyway.
> The one advantage of having the session-lost timer running within
> zkclient instead of our app, is that it could track the timer from the
> last actual heartbeat, rather than the disconnected event. Depending
> on the network conditions that caused the disconnection, it may have
> been a while from when we actually lost connectivity to ZK to when the
> disconnection event triggers, so our own timer may not be super
> accurate. Having zkclient set a timer based on the last heartbeat, and
> triggering the session lost event when that timer expires would be
> more accurate.
> -Dave
> On Fri, Apr 22, 2011 at 10:03 AM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
> > Well there are real limits about what knowledge you can have in a split
> > brain and how much coordination there can be.
> >
> > Having exactly one master in such situation is impossible.  You get to
> pick
> > your error scenario, however.  One option is to have one master almost
> all
> > the time with a failure mode of having zero acting masters a bit of the
> > time.  The other option is to have one master almost all the time with a
> > failure mode that has two masters a bit of the time.  You get to pick
> which
> > one.
> >
> > As Ben stated, the philosophy of ZK is to report facts that can be
> > demonstrated.  Your application will work pretty well with a timer even
> > though that could result in momentary double master situations.  Of
> course,
> > it can also result in periods of zero master as well since a master cut
> off
> > from ZK may well be cut off from the clients who want to be served.
> >
> > So the API isn't making a promise it can't keep.  It is promising to
> report
> > to you as soon as it is certain of things.  And it does.
> >
> > On Fri, Apr 22, 2011 at 6:51 AM, Scott Fines <scottfines@gmail.com>
> wrote:
> >
> >> I guess my objection would be that the API is making a promise that it
> can
> >> only deliver part of the time. If the client can't reconnect to
> ZooKeeper,
> >> then the client hasn't expired, which is an unusual state to find
> oneself
> >> in, and in leader-election systems like mine could result in having two
> >> practical leaders, while ZooKeeper is insisting that there is only one.
> >> This
> >> kind of split-brain scenario seems unavoidable in the absence of
> >> probabilistic failure checking (like timeouts).
> >>
> >> The FAQ, I've noticed, does make mention of this phenomenon. Perhaps
> >> something should be indicated there regarding the why and not just the
> >> mechanics. Otherwise, developers such as myself might find themselves
> >> unduly
> >> confused by it :)
> >>
> >> Thanks for all your help,
> >>
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message