Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 71157 invoked from network); 4 Feb 2011 00:01:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Feb 2011 00:01:39 -0000 Received: (qmail 88961 invoked by uid 500); 4 Feb 2011 00:01:38 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 88898 invoked by uid 500); 4 Feb 2011 00:01:38 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 88887 invoked by uid 99); 4 Feb 2011 00:01:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Feb 2011 00:01:38 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.9] (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 04 Feb 2011 00:01:37 +0000 Received: (qmail 71091 invoked by uid 99); 4 Feb 2011 00:01:17 -0000 Received: from localhost.apache.org (HELO mail-iw0-f170.google.com) (127.0.0.1) (smtp-auth username phunt, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Feb 2011 00:01:17 +0000 Received: by iwn6 with SMTP id 6so2197747iwn.15 for ; Thu, 03 Feb 2011 16:01:16 -0800 (PST) MIME-Version: 1.0 Received: by 10.42.223.197 with SMTP id il5mr3514257icb.48.1296777676768; Thu, 03 Feb 2011 16:01:16 -0800 (PST) Received: by 10.42.225.7 with HTTP; Thu, 3 Feb 2011 16:01:16 -0800 (PST) In-Reply-To: References: Date: Thu, 3 Feb 2011 16:01:16 -0800 Message-ID: Subject: Re: ZK Client won't time out when quorum irrevocably goes away From: Patrick Hunt To: user@zookeeper.apache.org, ryanobjc@gmail.com Content-Type: text/plain; charset=ISO-8859-1 On Thu, Feb 3, 2011 at 2:57 PM, Ryan Rawson wrote: > The result was the client never realized that it's session was > actually timed out, and the HBase processes continued to run. Kill -9 > and a restart fixed it. Hi Ryan, there are two issues at play here, session timeout and session expiration. Correct me if I'm wrong but I think you meant to say "the client never realized that it's session was actually _expired_". Which is correct behavior. Clients can only determine if a session is expired once they reconnect to the cluster. Session timeout on the other hand happens when the server heartbeat is not received by the client w/in the session timeout period. Clients who are disconnected from the cluster will attempt to reconnect back to the cluster until they are successful. When a client is disconnected the client's watchers will be notified about the disconnect. (same for expiration). See questions 1 & 2 here in the faq, specifically "Example state transitions" in question 2: https://cwiki.apache.org/confluence/display/ZOOKEEPER/FAQ Your clients were stuck btw steps 4 and 5 (which they will never reach in your scenario). Does that help? Patrick