curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jor...@jordanzimmerman.com>
Subject Re: How to obtain stable leader election over unstable ZK connections
Date Thu, 20 Aug 2015 14:41:52 GMT
I wonder if we can add error handling policies to Curator. Currently, the policy of all recipes
is hard-coded to treat SUSPENDED as a type of lost session. We could change this to be injected
like the retry policy. To solve this particular issue we’d also need to introduce a SESSION_LOST
state of some type. This is complicated as Curator re-creates connections internally. 

Thoughts?

-Jordan



On August 20, 2015 at 2:10:52 AM, Dong Lei (donglei@microsoft.com) wrote:

Hi curator-devs:  

We use Spark in standalone mode in which Spark leverage curator to manage ZK connections and
elect leader. Our Zookeeper may be not very stable and we get "session suspended and reconnected"
sometimes. The problem is that this kind of disassociated and reconnected triggers leader
election quite often. And Spark's reaction to leadership switching can be very costly.  

So I'm thinking about whether it's possible to tolerate such failure cases if we can reconnect
soon and the session is actually kept after the reconnection?  
Or does such a requirement makes sense to you?  

Any advice will be appreciated.  


Thanks  
Dong Lei  


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message