Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@locus.apache.org Received: (qmail 97649 invoked from network); 7 Jan 2009 22:40:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Jan 2009 22:40:20 -0000 Received: (qmail 26736 invoked by uid 500); 7 Jan 2009 22:40:20 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 26720 invoked by uid 500); 7 Jan 2009 22:40:20 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 26705 invoked by uid 99); 7 Jan 2009 22:40:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jan 2009 14:40:20 -0800 X-ASF-Spam-Status: No, hits=-2.8 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [207.126.228.150] (HELO rsmtp2.corp.yahoo.com) (207.126.228.150) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jan 2009 22:40:12 +0000 Received: from [10.72.76.168] (snvvpn2-10-72-76-c168.hq.corp.yahoo.com [10.72.76.168]) (authenticated bits=0) by rsmtp2.corp.yahoo.com (8.13.8/8.13.8/y.rout) with ESMTP id n07Mdkqg023721 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 7 Jan 2009 14:39:47 -0800 (PST) Message-ID: <49652F32.8010204@apache.org> Date: Wed, 07 Jan 2009 14:39:46 -0800 From: Patrick Hunt User-Agent: Thunderbird 2.0.0.19 (X11/20090105) MIME-Version: 1.0 To: zookeeper-user@hadoop.apache.org Subject: Re: Simpler ZooKeeper event interface.... References: <49652077.6040802@sun.com> In-Reply-To: <49652077.6040802@sun.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Vinod Johnson wrote: > Mahadev Konar wrote: >> Hi Vinod, >> I think what Ben meant was this-- >> >> The client will never know of a session expiration until and unless its >> connected to one of the servers. So the leader cannot demote itself since >> its connected to one of the servers. It might have lost its session >> (which >> all the others except itself would have realized) but will have to >> wait to >> demote itself until it connects to one of the servers. >> >> > I guess then I don't follow the leader election recipe. Is the following > scenario possible in the leader election recipe: > 1) Leader L is partitioned from the ensemble. > 2) ZK servers expire its session. > 3) Some other follower F now becomes a leader. > 4) L and F form a split brain? > > I had wrongly assumed that the session was like a lease in that it > allowed the client and server to independently know that the session had > expired by the use of the global clock. Wouldn't it make sense for the > client lib to expire its local session handle and never reuse it? Here's a good reason for each client to know it's session status (connected/disconnected/expired). Depending on the application, if L does not have a connected session to the ensemble it may need to be careful how it acts. I'm trying to think though some cases... In the case of passive leader the followers will look at zk and only send requests to the leader, so this seems fine (L no longer gets requests, it syncs to the ensemble at some point and finds it's session expired, it recovers as appropriate) In the case of an active leader, L continues to send commands (whatever) to the followers. However a new leader L' has since been elected and is also sending commands to the followers. In this case it seems like either a) L should not send commands if it's not sync'd to the ensemble (and holds the leader token) or b) followers should not accept commands from non-leader (only accept from the current leader). a) seems the right way to go; if L is disconnected it should stop sending commands to the followers, if it's resync'd in time it can start sending commands again, otw it's session will expire, a new leader L' elected and it will start sending commands to followers, eventually L will resync and notice that it is no longer the leader (and do whatever it takes to recover). > Wouldn't it make sense for the > client lib to expire its local session handle and never reuse it? I would think that depends on how expensive it is to change leaders. It would be trivial for the client to close it's session and start a new one each time it's notified of a disconnect from the ensemble. Patrick >> mahadev >> >> >> On 1/7/09 10:02 AM, "Vinod Johnson" wrote: >> >> >>> Benjamin Reed wrote: >>> >>>> You don't demote yourself on disconnect. (Everyone else may still >>>> believe you >>>> are the leader.) Check out the "Things to Remember about Watches" >>>> section in >>>> the programmer's guide. >>>> >>>> When you are disconnected from ZK you don't know what is happening, >>>> so you >>>> have to act conservatively. Your session may or may not have >>>> expired. You >>>> will not know for sure until you reconnect to ZK. >>>> >>> Just to make sure I'm not misunderstanding the last bit, even without >>> reconnecting to ZK, the leader's session could expire at the client >>> side, correct? In that case the conservative thing for the leader to do >>> is to demote itself if the intent is to avoid split brain (even though >>> the session may still be active at ZK for some period of time after >>> this). >>> >> >> >