curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Rantil <jens.ran...@tink.se>
Subject How to avoid CuratorConnectionLossException on leader loss?
Date Sun, 13 Sep 2015 16:15:28 GMT
Dear Curator(s),

A couple of days ago we did some maintenance of our Zookeeper ensemble and
did a rolling restart of each node. Restarting the followers worked like a
charm. However, restarting leader started throwing/logging
CuratorConnectionLossException exceptions that trickled down to our
application code until a reelection had occured. Example:

https://gist.github.com/JensRantil/309fa1bf17ee2982b8e7

We were hoping that Curator would gracefully retry until a leader had been
reelected, but I'm sure there is something we need to tweak for this to
avoid happening again.

*Question:* To avoid this to happen in the future, should we simply
increase our retry policy to retry longer before giving up?

Additional information:

   - Zookeeper version 1.4.5
   - Curator version 2.7.0
   - We are currently using the following retrying policy: new
   ExponentialBackoffRetry(1000, 3);
   - Zookeeper configuration all default except initLimit=60 and
   syncLimit=30.

Thanks,
Jens

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.rantil@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Mime
View raw message