zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: ZK Client won't time out when quorum irrevocably goes away
Date Fri, 04 Feb 2011 00:06:14 GMT
Yes thanks, I'm a little loose with the language not realizing there
are specific states, etc.

So in our scenario here, the quorum has moved, the clients will never
(a) reconnect ever again and (b) not be able to find the new quorum
location because IP addresses are cached.  If either:
- The client refreshed from DNS (although the JVM seems to have a DNS
cache which has hosed us as well)
- The client expires the session

We might have been in a better situation.

Reading the FAQ, it seems like the onus might be on the client to
check for session disconnect and compare it against the negotiated
session timeout to determine "oh hey we havent talked to ZK in a
while, lets quit".  Is that an expected client task?

Thanks for the quick reply!

On Thu, Feb 3, 2011 at 4:01 PM, Patrick Hunt <phunt@apache.org> wrote:
> On Thu, Feb 3, 2011 at 2:57 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>> The result was the client never realized that it's session was
>> actually timed out, and the HBase processes continued to run. Kill -9
>> and a restart fixed it.
> Hi Ryan,
> there are two issues at play here, session timeout and session
> expiration. Correct me if I'm wrong but I think you meant to say "the
> client never realized that it's session was actually _expired_". Which
> is correct behavior. Clients can only determine if a session is
> expired once they reconnect to the cluster. Session timeout on the
> other hand happens when the server heartbeat is not received by the
> client w/in the session timeout period. Clients who are disconnected
> from the cluster will attempt to reconnect back to the cluster until
> they are successful. When a client is disconnected the client's
> watchers will be notified about the disconnect. (same for expiration).
> See questions 1 & 2 here in the faq, specifically "Example state
> transitions" in question 2:
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/FAQ
> Your clients were stuck btw steps 4 and 5 (which they will never reach
> in your scenario).
> Does that help?
> Patrick

View raw message