hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Membership using ZK
Date Tue, 12 Oct 2010 21:23:22 GMT
Yes.  You should get that event.

You should also debug why you are getting disconnected in the first place.
 This is often a symptom of something really bad that is happening on your
client side such as very long GC's.  If these are unavoidable, then you need
to adjust the timeouts with ZK to reflect reality.  Another possibility is
that your network connections are dropping or that your application is
freezing for a non-GC reason.  Any of these problems are something you
should address.

Of course, the connection loss event should be handled correctly as well
since honest to god disconnects can happen.

On Tue, Oct 12, 2010 at 10:57 AM, Avinash Lakshman <
avinash.lakshman@gmail.com> wrote:

> Would my watcher get invoked on this ConnectionLoss event? If so I am
> thinking I will check for KeeperState.Disconnected and reset my state. Is
> my
> understanding correct? Please advice.
>
> Thanks
> Avinash
>
> On Tue, Oct 12, 2010 at 10:45 AM, Benjamin Reed <breed@yahoo-inc.com>
> wrote:
>
> >  ZooKeeper considers a client dead when it hasn't heard from that client
> > during the timeout period. clients make sure to communicate with
> ZooKeeper
> > at least once in 1/3 the timeout period. if the client doesn't hear from
> > ZooKeeper in 2/3 the timeout period, the client will issue a
> ConnectionLoss
> > event and cause outstanding requests to fail with a ConnectionLoss.
> >
> > So, if ZooKeeper decides a process is dead, the process will get a
> > ConnectionLoss event. Once ZooKeeper decides that a client is dead, if
> the
> > client reconnects, the client will get a SessionExpired. Once a session
> is
> > expired, the expired handle will become useless, so no new requests, no
> > watches, etc.
> >
> > The bottom line is if your process gets a process expired, you need to
> > treat that process as expired and recover by creating a new zookeeper
> handle
> > (possibly by restarting the process) and resetup your state.
> >
> > ben
> >
> >
> > On 10/12/2010 09:54 AM, Avinash Lakshman wrote:
> >
> >> This is what I have going:
> >>
> >> I have a bunch of 200 nodes come up and create an ephemeral entry under
> a
> >> znode names /Membership. When nodes are detected dead the node
> associated
> >> with the dead node under /Membership is deleted and watch delivered to
> the
> >> rest of the members. Now there are circumstances a node A is deemed dead
> >> while the process is still up and running on A. It is a false detection
> >> which I need to probably deal with. How do I deal with this situation?
> >>  Over
> >> time false detections delete all the entries underneath the /Membership
> >> znode even though all processes are up and running.
> >>
> >> So my questions are:
> >> Would the watches be pushed out to the node that is falsely deemed dead?
> >> If
> >> so I can have that process recreate the ephemeral znode underneath
> >> /Membership.
> >> If a node leaves a watch and then truly crashes. When it comes back up
> >> would
> >> it get watches it missed during the interim period? In any case how do
> >> watches behave in the event of false/true failure detection?
> >>
> >> Thanks
> >> A
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message