zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: ZK Client won't time out when quorum irrevocably goes away
Date Fri, 04 Feb 2011 01:37:55 GMT
Ryan... just as an aside, you can move a ZK cluster a step at a time by
adding new nodes and then decommissioning old nodes.
Your clients may get confused if they try to reconnect and never really
learned about the whole quorum, but you can do quite a lot with rolling
restarts as far as getting the ZK state itself walked over to new machines.

On Thu, Feb 3, 2011 at 4:06 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:

> Yes thanks, I'm a little loose with the language not realizing there
> are specific states, etc.
>
> So in our scenario here, the quorum has moved, the clients will never
> (a) reconnect ever again and (b) not be able to find the new quorum
> location because IP addresses are cached.  If either:
> - The client refreshed from DNS (although the JVM seems to have a DNS
> cache which has hosed us as well)
> - The client expires the session
>
> We might have been in a better situation.
>
> Reading the FAQ, it seems like the onus might be on the client to
> check for session disconnect and compare it against the negotiated
> session timeout to determine "oh hey we havent talked to ZK in a
> while, lets quit".  Is that an expected client task?
>
> Thanks for the quick reply!
> -ryan
>
> On Thu, Feb 3, 2011 at 4:01 PM, Patrick Hunt <phunt@apache.org> wrote:
> > On Thu, Feb 3, 2011 at 2:57 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> >> The result was the client never realized that it's session was
> >> actually timed out, and the HBase processes continued to run. Kill -9
> >> and a restart fixed it.
> >
> > Hi Ryan,
> >
> > there are two issues at play here, session timeout and session
> > expiration. Correct me if I'm wrong but I think you meant to say "the
> > client never realized that it's session was actually _expired_". Which
> > is correct behavior. Clients can only determine if a session is
> > expired once they reconnect to the cluster. Session timeout on the
> > other hand happens when the server heartbeat is not received by the
> > client w/in the session timeout period. Clients who are disconnected
> > from the cluster will attempt to reconnect back to the cluster until
> > they are successful. When a client is disconnected the client's
> > watchers will be notified about the disconnect. (same for expiration).
> >
> > See questions 1 & 2 here in the faq, specifically "Example state
> > transitions" in question 2:
> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/FAQ
> > Your clients were stuck btw steps 4 and 5 (which they will never reach
> > in your scenario).
> >
> > Does that help?
> >
> > Patrick
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message