hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Weird ephemeral node issue
Date Mon, 30 Aug 2010 17:56:20 GMT
Rather than the wiki would be great to get this into the docs. Would you
mind creating a JIRA?
https://issues.apache.org/jira/browse/ZOOKEEPER

Thanks,

<https://issues.apache.org/jira/browse/ZOOKEEPER>Patrick

On Tue, Aug 17, 2010 at 8:29 PM, Qing Yan <qingyan@gmail.com> wrote:

> Thanks for the explaination! I suggest this goes to the wiki..
>
> <quote>
> the client only finds out about session expiration events when the client
> reconnects to the cluster. if zk tells a client that its session is
> expired,
> the ephemerals that correspond to that session will already be cleaned up.
>
> - deletion of an ephemeral file due to loss of client connection will occur
> after the client gets a connection loss
>
> - deletion of an ephemeral file will precede delivery of a session
> expiration event to the owner
> </quote>
>
> So session expirations means two things here : server view(ephemeral clean
> up) & client view(event delivery) , there are
> no guarantee how long it will take in between, correct?
>
> I guess the confusion rises from the documention which doesn't distinguish
> these two concepts, e.g. in the javadoc
> http://hadoop.apache.org/zookeeper/docs/r3.3.1/api/index.html
>
> An ephemeral node will be removed by the ZooKeeper automatically when the
> session associated with the creation of the node expires.
>
> It is actually refering to the server view not the client view.
>
>
>
> On Wed, Aug 18, 2010 at 1:12 AM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > Uncharacteristically, I think that Ben's comments could use a little bit
> of
> > amplification.
> >
> > First, ZK is designed with certain guarantees in mind and almost all
> > operational characteristics flow logically from these guarantees.
> >
> > The guarantee that Ben mentioned here in passing is that if a client gets
> > session expiration, it is *guaranteed* that the ephemerals have been
> > cleaned
> > up.  This guarantee is what drives the notification of session expiration
> > after reconnection since while the client is disconnected, it cannot know
> > if
> > the cluster is operating correctly or not and thus cannot know if the
> > ephemerals have been cleaned up yet.  The only way to have certain
> > knowledge
> > that the cluster has cleaned up the ephemerals is to get back in touch
> with
> > an operating cluster.
> >
> > The client is not completely in the dark.  As Ben implied, it can know
> that
> > the cluster is unavailable (it got a ConnectionLoss event, after all).
> >  While the cluster is unavailable and before it gets a session expiration
> > notification, the client can go into safe mode.
> >
> > The moral of this story is that to get the most out of ZK, it is best to
> > adopt the same guarantee based design process that drove ZK in the first
> > place.  The first step is that you have to decide what guarantees that
> you
> > want to provide and then work from ZK's guarantees to get to yours.
> >
> > In the classic leader-election use of ZK, the key guarantee that we want
> > is:
> >
> > - the number of leaders is less than or equal to 1
> >
> > Note that you can't guarantee that the number == 1, because other stuff
> > could happen.  This has nothing to do with ZK.
> >
> > The pertinent ZK guarantees are:
> >
> > - an ephemeral file can only be created by a single session
> >
> > - deletion of an ephemeral file due to loss of client connection will
> occur
> > after the client gets a connection loss
> >
> > - deletion of an ephemeral file will precede delivery of a session
> > expiration event to the owner
> >
> > Phrased in terms of CSP-like constructs, the client has events
> > BecomeMaster,
> > EnterSafeMode, ExitSafeMode, RelinquishMaster and Crash that must occur
> > according to this grammar:
> >
> > client := (
> >   (BecomeMaster; (EnterSafeMode; ExitSafeMode)*;
> > EnterSafeMode?; RelinquishMaster)
> >  | (BecomeMaster; (EnterSafeMode; ExitSafeMode)*; EnterSafeMode?; Crash)
> >  | Crash
> >  )*
> >
> > To get the guarantees that we want, we can require the client to only do
> > BecomeMaster after it creates an ephemeral file and require it to either
> > Crash, RelinquishMaster or EnterSafeMode before that ephemeral file is
> > deleted.  The only way that we can do that is to immediately do
> > EnterSafeMode on connection loss and then do RelinquishMaster on session
> > expiration or ExitSafeMode on connection restored.  It is involved, but
> you
> > can actually do a proof of correctness from this that shows that your
> > guarantee will be honored even in the presence of ZK or the client
> crashing
> > or being partitioned.
> >
> >
> >
> > On Tue, Aug 17, 2010 at 9:26 AM, Benjamin Reed <breed@yahoo-inc.com>
> > wrote:
> >
> > > there are two things to keep in mind when thinking about this issue:
> > >
> > > 1) if a zk client is disconnected from the cluster, the client is
> > > essentially in limbo. because the client cannot talk to a server it
> > cannot
> > > know if its session is still alive. it also cannot close its session.
> > >
> > > 2) the client only finds out about session expiration events when the
> > > client reconnects to the cluster. if zk tells a client that its session
> > is
> > > expired, the ephemerals that correspond to that session will already be
> > > cleaned up.
> > >
> > > one of the main design points about zk is that zk only gives correct
> > > information. if zk cannot give correct information, it basically says
> "i
> > > don't know". connection loss exceptions and disconnected states are
> > > basically "i don't know".
> > >
> > > generally applications we design go into a "safe" mode, meaning they
> may
> > > serve reads but reject changes, when disconnected from zk and only kill
> > > themselves when they find out their session has expired.
> > >
> > > ben
> > >
> > > ps - session information is replicated to all zk servers, so if a
> leader
> > > dies, all replicas know the sessions that are currently active and
> their
> > > timeouts.
> > >
> > > On 08/16/2010 09:03 PM, Ted Dunning wrote:
> > >
> > >> Ben or somebody else will have to repeat some of the detailed logic
> for
> > >> this, but it has
> > >> to do with the fact that you can't be sure what has happened during
> the
> > >> network partition.
> > >> One possibility is the one you describe, but another is that the
> > partition
> > >> happened because
> > >> a majority of the ZK cluster lost power and you can't see the
> remaining
> > >> nodes.  Those nodes
> > >> will continue to serve any files in a read-only fashion.  If the
> > partition
> > >> involves you losing
> > >> contact with the entire cluster at the same time a partition of the
> > >> cluster
> > >> into a quorum and
> > >> a minority happens, then your ephemeral files could continue to exist
> at
> > >> least until the breach
> > >> in the cluster itself is healed.
> > >>
> > >> Suffice it to say that there are only a few strategies that leave you
> > with
> > >> a
> > >> coherent picture
> > >> of the universe.  Importantly, you shouldn't assume that the
> ephemerals
> > >> will
> > >> disappear at
> > >> the same time as the session expiration event is delivered.
> > >>
> > >> On Mon, Aug 16, 2010 at 8:31 PM, Qing Yan<qingyan@gmail.com>  wrote:
> > >>
> > >>
> > >>
> > >>> Ouch, is this the current ZK behavior? This is unexpected, if the
> > >>> client get partitioned from ZK cluster, he should
> > >>> get notified and take some action(e.g. commit suicide) otherwise how
> > >>> to tell a ephemeral node is really
> > >>> up or down? Zombie can create synchronization nightmares..
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Aug 16, 2010 at 7:22 PM, Dave Wright<wrightd@gmail.com>
> >  wrote:
> > >>>
> > >>>
> > >>>> Another possible cause for this that I ran into recently with the
c
> > >>>>
> > >>>>
> > >>> client -
> > >>>
> > >>>
> > >>>> you don't get the session expired notification until you are
> > reconnected
> > >>>>
> > >>>>
> > >>> to
> > >>>
> > >>>
> > >>>> the quorum and it informs you the session is lost.  If you get
> > >>>>
> > >>>>
> > >>> disconnected
> > >>>
> > >>>
> > >>>> and can't reconnect you won't get the notification.  Personally
I
> > think
> > >>>>
> > >>>>
> > >>> the
> > >>>
> > >>>
> > >>>> client api should track the session expiration time locally and
> > >>>>
> > >>>>
> > >>> information
> > >>>
> > >>>
> > >>>> you once it's expired.
> > >>>>
> > >>>> On Aug 16, 2010 2:09 AM, "Qing Yan"<qingyan@gmail.com>  wrote:
> > >>>>
> > >>>> Hi Ted,
> > >>>>
> > >>>>  Do you mean GC problem can prevent delivery of SESSION EXPIRE
> event?
> > >>>> Hum...so you have met this problem before?
> > >>>> I didn't see any OOM though, will look into it more.
> > >>>>
> > >>>>
> > >>>> On Mon, Aug 16, 2010 at 12:46 PM, Ted Dunning<ted.dunning@gmail.com
> >
> > >>>>
> > >>>>
> > >>> wrote:
> > >>>
> > >>>
> > >>>> I am assuming that y...
> > >>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message