curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Boyle <sbo...@connexity.com>
Subject RE: Leader Latch question
Date Wed, 17 Aug 2016 18:07:19 GMT
I appreciate your response.  Any thoughts on how the issue may have occurred in production?
 Or thoughts on how to reproduce that scenario?

In the production case, there were two instances of the app – both configured for a list
of 5 zookeepers.

Thanks,
Steve

From: Jordan Zimmerman [mailto:jordan@jordanzimmerman.com]
Sent: Wednesday, August 17, 2016 11:03 AM
To: user@curator.apache.org
Subject: Re: Leader Latch question

Manual removal of the latch node isn’t supported. It would require the latch to add a watch
on its own node and that has performance/runtime overhead. The recommended behavior is to
watch for connection loss/suspended events and exit your latch when that happens.

-Jordan

On Aug 17, 2016, at 12:43 PM, Steve Boyle <sboyle@connexity.com<mailto:sboyle@connexity.com>>
wrote:

I’m using the Leader Latch recipe.  I can successfully bring up two instances of my app
and have one become ‘active’ and one become ‘standby’.  Most everything works as expected.
 We had an issue, in production, when adding a zookeeper to our existing quorum, both instances
of the app became ‘active’.  Unfortunately, the log files rolled over before we could
check for exceptions.  I’ve been trying to reproduce this issue in a test environment. 
In my test environment, I have two instances of my app configured to use a single zookeeper
– this zookeeper is part of a 5 node quorum and is not currently the leader.  I can trigger
both instances of the app to become ‘active’ if I use zkCli and manually delete the latch
path from the single zookeeper to which my apps are connected.  When I manually delete the
latch path, I can see via debug logging that the instance that was previously ‘standby’
gets a notification from zookeeper “Got WatchedEvent state:SyncConnected type:NodeDeleted”.
 However, the instance that had already been active gets no notification at all.  Is it expected
that manually removing the latch path would only generate notifications to some instances
of my app?

Thanks,
Steve Boyle

Mime
View raw message