Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: leader election, scheduled tasks, losing leadership
From: Jordan Zimmerman <jordan@jordanzimmerman.com>
In-Reply-To: 
 <CALEPPW9_jGT6ncvkDpyhPveTBwjDLsu2u1Hd0Ymni63t0LMpAQ@mail.gmail.com>
Date: Sat, 8 Dec 2012 21:00:44 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <EA68F669-7CE4-4996-A799-08D5E4DE3287@jordanzimmerman.com>
References: 
 <CALEPPW9nn9bT_-qiSYPvut7z1STRuA=JA-VvkQ746_V4wtQ7hA@mail.gmail.com>
 <97FC3600-1752-46FD-B1E7-7FC052697D04@jordanzimmerman.com>
 <CALEPPW-Hae0ie6Q7xq56HQNP=C8TneC8R+foKNfcwaLyZcAFsg@mail.gmail.com>
 <560030CC-ABE3-4E57-AC3F-56A4B8917F1A@jordanzimmerman.com>
 <CALEPPW8Yv3jxrMiw4vqPm-hjfjoMhBUPVgWK9+2+ewztitQj-w@mail.gmail.com>
 <CALEPPW9_jGT6ncvkDpyhPveTBwjDLsu2u1Hd0Ymni63t0LMpAQ@mail.gmail.com>
To: user@zookeeper.apache.org

The leader latch lock is the equivalent of task in progress. I assume =
the task is running in the same VM as the leader lock. The only reason =
the VM would lose leadership is if it crashes in which case the process =
would die anyway.

-JZ

On Dec 8, 2012, at 8:56 PM, Eric Pederson <ericacm@gmail.com> wrote:

> If I recall correctly it was Henry Robinson that gave me the advice to =
have
> a "task in progress" check.
>=20
>=20
> -- Eric
>=20
>=20
>=20
> On Sat, Dec 8, 2012 at 11:54 PM, Eric Pederson <ericacm@gmail.com> =
wrote:
>=20
>> I am using Curator LeaderLatch :)
>>=20
>>=20
>> -- Eric
>>=20
>>=20
>>=20
>>=20
>> On Sat, Dec 8, 2012 at 11:52 PM, Jordan Zimmerman <
>> jordan@jordanzimmerman.com> wrote:
>>=20
>>> You might check your leader implementation. Writing a correct leader
>>> recipe is actually quite challenging due to edge cases. Have a look =
at
>>> Curator (disclosure: I wrote it) for an example.
>>>=20
>>> -JZ
>>>=20
>>> On Dec 8, 2012, at 8:49 PM, Eric Pederson <ericacm@gmail.com> wrote:
>>>=20
>>>> Actually I had the same thought and didn't consider having to do =
this
>>> until
>>>> I talked about my project at a Zookeeper User Group a month or so =
ago
>>> and I
>>>> was given this advice.
>>>>=20
>>>> I know that I do see leadership being lost/transferred when one of =
the
>>> ZK
>>>> servers is restarted (not the whole ensemble).   And it seems like =
I've
>>>> seen it happen even when the ensemble stays totally stable (though =
I am
>>> not
>>>> 100% sure as it's been a while since I have worked on this =
particular
>>>> application).
>>>>=20
>>>>=20
>>>>=20
>>>> -- Eric
>>>>=20
>>>>=20
>>>>=20
>>>> On Sat, Dec 8, 2012 at 11:25 PM, Jordan Zimmerman <
>>>> jordan@jordanzimmerman.com> wrote:
>>>>=20
>>>>> Why would it lose leadership? The only reason I can think of is if =
the
>>> ZK
>>>>> cluster goes down. In normal use, the ZK cluster won't go down (I
>>> assume
>>>>> you're running 3 or 5 instances).
>>>>>=20
>>>>> -JZ
>>>>>=20
>>>>> On Dec 8, 2012, at 8:17 PM, Eric Pederson <ericacm@gmail.com> =
wrote:
>>>>>=20
>>>>>> During the time the task is running a cluster member could lose =
its
>>>>>> leadership.
>>>>>=20
>>>>>=20
>>>=20
>>>=20
>>=20