curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henrik Nordvik <henri...@gmail.com>
Subject Switching from State suspended, to lost, to suspended
Date Tue, 05 Nov 2013 09:47:34 GMT
Hi,

I'm getting some strange behaviour when stopping zookeeper in one
environment that I can't reproduce locally.
The result is that the leader selector "quits" even though it is set as
auto-requeue. (I think that happens because the retry loop inside
LeaderSelector checks the interrupt-flag, which is set again even when I
cleared it).

I think it boils down to getting

2013-11-04 18:22:32,501 INFO  [main-EventThread    ]
c.n.c.f.state.ConnectionStateManager      - State change: LOST
2013-11-04 18:22:32,501 DEBUG [ectionStateManager-0]
s.f.s.a.feed.MyListener        - Interrupting thread
Thread[LeaderSelector-0,5,main]
2013-11-04 18:22:32,503 INFO  [main-EventThread    ]
c.n.c.f.state.ConnectionStateManager      - State change: SUSPENDED
2013-11-04 18:22:32,504 DEBUG [ectionStateManager-0]
s.f.s.a.feed.MyListener        - Interrupting thread
Thread[LeaderSelector-0,5,main]

... then I handle the interrupt in the leader thread.

Then I get this:
2013-11-04 18:22:36,465 INFO  [main-EventThread    ]
c.n.c.f.state.ConnectionStateManager      - State change: LOST
2013-11-04 18:22:36,465 INFO  [main-EventThread    ]
c.n.c.f.state.ConnectionStateManager      - State change: SUSPENDED
2013-11-04 18:22:36,465 DEBUG [ectionStateManager-0]
s.f.s.a.feed.MyListener        - StateChanged: LOST
2013-11-04 18:22:36,465 DEBUG [ectionStateManager-0]
s.f.s.a.feed.MyListener        - Interrupting thread
Thread[LeaderSelector-0,5,main]
2013-11-04 18:22:36,466 DEBUG [ectionStateManager-0]
s.f.s.a.feed.MyListener        - StateChanged: SUSPENDED
2013-11-04 18:22:36,466 DEBUG [ectionStateManager-0]
s.f.s.a.feed.MyListener        - Interrupting thread
Thread[LeaderSelector-0,5,main]


Full log is here: https://gist.github.com/zerd/7316258

The code follows the old leader selector example pretty well:

    @Override
    public void takeLeadership(CuratorFramework curatorFramework) throws
Exception {
        ourThread = Thread.currentThread();
        logger.debug(format("(%s) Got leadership", ourThread));
        try {
            waitForAndPerformWork();
        } catch (InterruptedException e) {
            logger.debug(format("(%s) Interrupted ", ourThread), e);
        } finally {
            logger.debug(format("(%s) No longer leader", ourThread));
        }
    }

    @Override
    public void stateChanged(CuratorFramework curatorFramework,
ConnectionState newState) {
        logger.debug("StateChanged: " + newState);

        if ((newState == ConnectionState.LOST) || (newState ==
ConnectionState.SUSPENDED)) {
            if (ourThread != null) {
                logger.debug("Interrupting thread " + ourThread);
                ourThread.interrupt();
            } else {
                logger.debug("Thread is null");
            }
        }
    }

Is it supposed to go back and forth from lost to suspended?
My goal is to get it to resume trying to get the leadership when zookeeper
comes back. Do I have to requeue it manually when this happens?
Would upgrading to latest curator with CancelLeadershipException fix this?

Thank you very much for your time.

--
Henrik Nordvik

Mime
View raw message