curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arie Zilberstein <azilberst...@salesforce.com>
Subject Re: Switching from State suspended, to lost, to suspended
Date Thu, 14 Nov 2013 16:11:56 GMT
Henrik,

You should be able to transactionally test for leadership and update a
state a varaible in Zookeeper.
This is something that I requested a few weeks ago in a thread named
"Atomically setting a node's data while having leadership", and I hope will
be implemented. Personally I think it is a must-have capability.

In your scenario, however, since you must update a database, there is a
race condition that cannot be readily resolved (without some kind of
distributed transactions). You can test for leadership and then update the
DB, but there is no guarantee that the leadership is still yours by the end
of your DB update call.

Thanks,
Arie


On Wed, Nov 13, 2013 at 4:02 PM, Henrik Nordvik <henrikno@gmail.com> wrote:

> I've upgraded to curator 2.3.0.
> LeaderSelector still uses thread interrupting for signaling to the thread
> running takeLeadership() to stop, right?
> Inside my takeLeadership I do some database operations, and before
> commiting I'm checking if I was interrupted, and roll back if I was.
> However, some code in between clears the interrupt flag (i.e. logback does
> this), so I'm committing even though I lost/suspended the connection.
>
> I need some other criteria to decide if I can commit or not. hasLeadership
> only checks a local flag, which is always true inside takeLeadership().
> Do I have another flag I can check?
>
>
> --
> Henrik Nordvik
>
>
> On Tue, Nov 5, 2013 at 5:21 PM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
>
>> This sounds like a variation of
>> https://issues.apache.org/jira/browse/CURATOR-54 - The next release of
>> Curator (later this week) provides a more robust way of canceling
>> leadership that doesn’t require thread interruption.
>>
>> -Jordan
>>
>> On Nov 5, 2013, at 1:47 AM, Henrik Nordvik <henrikno@gmail.com> wrote:
>>
>> Hi,
>>
>> I'm getting some strange behaviour when stopping zookeeper in one
>> environment that I can't reproduce locally.
>> The result is that the leader selector "quits" even though it is set as
>> auto-requeue. (I think that happens because the retry loop inside
>> LeaderSelector checks the interrupt-flag, which is set again even when I
>> cleared it).
>>
>> I think it boils down to getting
>>
>> 2013-11-04 18:22:32,501 INFO  [main-EventThread    ]
>> c.n.c.f.state.ConnectionStateManager      - State change: LOST
>> 2013-11-04 18:22:32,501 DEBUG [ectionStateManager-0]
>> s.f.s.a.feed.MyListener        - Interrupting thread
>> Thread[LeaderSelector-0,5,main]
>> 2013-11-04 18:22:32,503 INFO  [main-EventThread    ]
>> c.n.c.f.state.ConnectionStateManager      - State change: SUSPENDED
>> 2013-11-04 18:22:32,504 DEBUG [ectionStateManager-0]
>> s.f.s.a.feed.MyListener        - Interrupting thread
>> Thread[LeaderSelector-0,5,main]
>>
>> ... then I handle the interrupt in the leader thread.
>>
>> Then I get this:
>> 2013-11-04 18:22:36,465 INFO  [main-EventThread    ]
>> c.n.c.f.state.ConnectionStateManager      - State change: LOST
>> 2013-11-04 18:22:36,465 INFO  [main-EventThread    ]
>> c.n.c.f.state.ConnectionStateManager      - State change: SUSPENDED
>> 2013-11-04 18:22:36,465 DEBUG [ectionStateManager-0]
>> s.f.s.a.feed.MyListener        - StateChanged: LOST
>> 2013-11-04 18:22:36,465 DEBUG [ectionStateManager-0]
>> s.f.s.a.feed.MyListener        - Interrupting thread
>> Thread[LeaderSelector-0,5,main]
>> 2013-11-04 18:22:36,466 DEBUG [ectionStateManager-0]
>> s.f.s.a.feed.MyListener        - StateChanged: SUSPENDED
>> 2013-11-04 18:22:36,466 DEBUG [ectionStateManager-0]
>> s.f.s.a.feed.MyListener        - Interrupting thread
>> Thread[LeaderSelector-0,5,main]
>>
>>
>> Full log is here: https://gist.github.com/zerd/7316258
>>
>> The code follows the old leader selector example pretty well:
>>
>>     @Override
>>     public void takeLeadership(CuratorFramework curatorFramework) throws
>> Exception {
>>         ourThread = Thread.currentThread();
>>         logger.debug(format("(%s) Got leadership", ourThread));
>>         try {
>>             waitForAndPerformWork();
>>         } catch (InterruptedException e) {
>>             logger.debug(format("(%s) Interrupted ", ourThread), e);
>>         } finally {
>>             logger.debug(format("(%s) No longer leader", ourThread));
>>         }
>>     }
>>
>>     @Override
>>     public void stateChanged(CuratorFramework curatorFramework,
>> ConnectionState newState) {
>>         logger.debug("StateChanged: " + newState);
>>
>>         if ((newState == ConnectionState.LOST) || (newState ==
>> ConnectionState.SUSPENDED)) {
>>             if (ourThread != null) {
>>                 logger.debug("Interrupting thread " + ourThread);
>>                 ourThread.interrupt();
>>             } else {
>>                 logger.debug("Thread is null");
>>             }
>>         }
>>     }
>>
>> Is it supposed to go back and forth from lost to suspended?
>> My goal is to get it to resume trying to get the leadership when
>> zookeeper comes back. Do I have to requeue it manually when this happens?
>> Would upgrading to latest curator with CancelLeadershipException fix this?
>>
>> Thank you very much for your time.
>>
>> --
>> Henrik Nordvik
>>
>>
>>
>

Mime
View raw message