Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 42FB8D01A for ; Sun, 9 Dec 2012 05:01:16 +0000 (UTC) Received: (qmail 47882 invoked by uid 500); 9 Dec 2012 05:01:15 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 47708 invoked by uid 500); 9 Dec 2012 05:01:14 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 47683 invoked by uid 99); 9 Dec 2012 05:01:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Dec 2012 05:01:14 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.160.42] (HELO mail-pb0-f42.google.com) (209.85.160.42) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Dec 2012 05:01:07 +0000 Received: by mail-pb0-f42.google.com with SMTP id rp2so1040601pbb.15 for ; Sat, 08 Dec 2012 21:00:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=ErkxJrq42nw8okljY4pW8/8NWaQPOVx8GxeWNuToI9U=; b=nXs0+WoY4SuDQeJrTQpNx/5rJ1otNwD7fI//9ePE5QNs4CqWVrtd/ckYgUMWLUlsT1 ztFxqWfD1COHIeLuk2L4tGcGE8/BW64LhQzV3e7H0bxVj1RiOv+R7+pgQD4o9RANjKur oA8X5UCkKqHJkYiR7XibGbP1+A6HKHVhfz5eWZqU7mzNOeUBkasK+ro5cMzn38StMKGI nCz84TyrZ7A5w1jsTd/Pm3QkREYimA8VV9M52yHGOc49j65FnOK5ouNt8mSiwwPGvaOf qUrsvTEN/PvhF3hgl9MkAQn4Yknhnn4FSXLLIgSJ610neuVXNMNzlEJ3u2/tAp21cb2b LXkA== Received: by 10.68.143.106 with SMTP id sd10mr28203678pbb.62.1355029247367; Sat, 08 Dec 2012 21:00:47 -0800 (PST) Received: from [10.0.1.50] (c-24-130-72-169.hsd1.ca.comcast.net. [24.130.72.169]) by mx.google.com with ESMTPS id kc4sm9493322pbc.23.2012.12.08.21.00.45 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 08 Dec 2012 21:00:46 -0800 (PST) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: leader election, scheduled tasks, losing leadership From: Jordan Zimmerman In-Reply-To: Date: Sat, 8 Dec 2012 21:00:44 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: <97FC3600-1752-46FD-B1E7-7FC052697D04@jordanzimmerman.com> <560030CC-ABE3-4E57-AC3F-56A4B8917F1A@jordanzimmerman.com> To: user@zookeeper.apache.org X-Mailer: Apple Mail (2.1499) X-Gm-Message-State: ALoCoQmxR1AgsR68ghWFx/AY+JahxHExVrMlKJEF+0LmIbuc3cNPBwj6pMWlUqhgMCBTCEE8Slo1 X-Virus-Checked: Checked by ClamAV on apache.org The leader latch lock is the equivalent of task in progress. I assume = the task is running in the same VM as the leader lock. The only reason = the VM would lose leadership is if it crashes in which case the process = would die anyway. -JZ On Dec 8, 2012, at 8:56 PM, Eric Pederson wrote: > If I recall correctly it was Henry Robinson that gave me the advice to = have > a "task in progress" check. >=20 >=20 > -- Eric >=20 >=20 >=20 > On Sat, Dec 8, 2012 at 11:54 PM, Eric Pederson = wrote: >=20 >> I am using Curator LeaderLatch :) >>=20 >>=20 >> -- Eric >>=20 >>=20 >>=20 >>=20 >> On Sat, Dec 8, 2012 at 11:52 PM, Jordan Zimmerman < >> jordan@jordanzimmerman.com> wrote: >>=20 >>> You might check your leader implementation. Writing a correct leader >>> recipe is actually quite challenging due to edge cases. Have a look = at >>> Curator (disclosure: I wrote it) for an example. >>>=20 >>> -JZ >>>=20 >>> On Dec 8, 2012, at 8:49 PM, Eric Pederson wrote: >>>=20 >>>> Actually I had the same thought and didn't consider having to do = this >>> until >>>> I talked about my project at a Zookeeper User Group a month or so = ago >>> and I >>>> was given this advice. >>>>=20 >>>> I know that I do see leadership being lost/transferred when one of = the >>> ZK >>>> servers is restarted (not the whole ensemble). And it seems like = I've >>>> seen it happen even when the ensemble stays totally stable (though = I am >>> not >>>> 100% sure as it's been a while since I have worked on this = particular >>>> application). >>>>=20 >>>>=20 >>>>=20 >>>> -- Eric >>>>=20 >>>>=20 >>>>=20 >>>> On Sat, Dec 8, 2012 at 11:25 PM, Jordan Zimmerman < >>>> jordan@jordanzimmerman.com> wrote: >>>>=20 >>>>> Why would it lose leadership? The only reason I can think of is if = the >>> ZK >>>>> cluster goes down. In normal use, the ZK cluster won't go down (I >>> assume >>>>> you're running 3 or 5 instances). >>>>>=20 >>>>> -JZ >>>>>=20 >>>>> On Dec 8, 2012, at 8:17 PM, Eric Pederson = wrote: >>>>>=20 >>>>>> During the time the task is running a cluster member could lose = its >>>>>> leadership. >>>>>=20 >>>>>=20 >>>=20 >>>=20 >>=20