Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D09A67019 for ; Fri, 25 Nov 2011 14:44:40 +0000 (UTC) Received: (qmail 9454 invoked by uid 500); 25 Nov 2011 14:44:40 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 9424 invoked by uid 500); 25 Nov 2011 14:44:40 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 9416 invoked by uid 99); 25 Nov 2011 14:44:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Nov 2011 14:44:40 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ahfeel@gmail.com designates 209.85.213.42 as permitted sender) Received: from [209.85.213.42] (HELO mail-yw0-f42.google.com) (209.85.213.42) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Nov 2011 14:44:33 +0000 Received: by ywt2 with SMTP id 2so2927062ywt.15 for ; Fri, 25 Nov 2011 06:44:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=kh9N78E4XZ4yK3tF2HxvfPkGA8smri5nbMBlgfPz3U4=; b=maJiI18V9hTRGbM45W930t1ABIdUa57Izu9mBNzttLCmOWV2Ag3Yp/eUxFQh4Q1cS8 7kmyA11AT4+/xVwWm11v2mZWYpQL6qWKEQ2Mr842Nw3BqPX4hOH0LWvaTp3N4UFpUMn6 UAlG5S0f/a8EgPQvA+I8RPyhMFBCPup0DRCEg= MIME-Version: 1.0 Received: by 10.68.38.106 with SMTP id f10mr27595496pbk.37.1322232252668; Fri, 25 Nov 2011 06:44:12 -0800 (PST) Sender: ahfeel@gmail.com Received: by 10.68.33.6 with HTTP; Fri, 25 Nov 2011 06:44:12 -0800 (PST) In-Reply-To: References: Date: Fri, 25 Nov 2011 15:44:12 +0100 X-Google-Sender-Auth: 3aZQu_qFVAxyxoyLWSR5AKjMH1s Message-ID: Subject: Re: Missing session state handling in most Leader Election implementations From: =?ISO-8859-1?Q?J=E9r=E9mie_BORDIER?= To: user@zookeeper.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Just a quick post to point at that the leader election example that was posted on the list earlier today is very clean and handle the disconnected / expired cases. https://github.com/cyberroadie/zookeeper-leader/ J=E9r=E9mie On Fri, Nov 18, 2011 at 7:04 PM, Jordan Zimmerman wrote: > I just did a quickie test. If the cluster goes down you get the Disconnec= t > but do not get a session expiration. So, there wouldn't be an opportunity > to transition from SUSPENDED to LOST (unless the client makes another ZK > call). So, this brings me back to doing the background sync(). > > -JZ > > On 11/18/11 9:52 AM, "Ted Dunning" wrote: > >>Is the background sync even necessary? =A0The ZK client itself will >>re-establish connection if it can. >> >>I think that LOST should only be sent on session expiration. >> >>On Fri, Nov 18, 2011 at 1:07 AM, Jordan Zimmerman >>wrote: >> >>> FYI >>> >>> Curator now has a staged connection notification mechanism for dealing >>> with issues like this. When the Curator managed connection receives a >>> Disconnect, it posts a message to listeners that the connection is >>> SUSPENDED. If the connection can be re-established (via a background >>>sync() >>> using the current retry policy) the listeners receive RECONNECTED >>>otherwise >>> they receive LOST. Thus, users of the Curator LeaderSelector can know i= f >>> they should pause their leader activity and/or stop leader activity. >>> >>> -JZ >>> ________________________________________ >>> From: Ted Dunning [ted.dunning@gmail.com] >>> Sent: Monday, November 14, 2011 6:24 PM >>> To: user@zookeeper.apache.org >>> Subject: Re: Missing session state handling in most Leader Election >>> implementations >>> >>> On Mon, Nov 14, 2011 at 2:41 PM, Jordan Zimmerman >>>>> >wrote: >>> >>> > It turns out that this is tricky to solve. When the server you're >>> > connected to goes down, you get a >>>Watcher.Event.KeeperState.Disconnected. >>> > However, it could be that you are able to reconnect to another server >>>so >>> > the disconnected event should be ignored. >>> >>> >>> The event should not be ignored. =A0The master should pause in being a >>> master, but not unload any major data structures. =A0If it reconnects >>> instantly, then it should continue as if nothing had happened. =A0You c= an >>> also have a time limit for how long you wait before you decide to pause >>> operation as master. =A0As you increase that time, you increase the >>> probability of two masters existing at the same time. =A0If the reconne= ct >>> happens before the timeout, you don't need to both the master. >>> > > --=20 J=E9r=E9mie 'ahFeel' BORDIER