aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zameer Manji <zma...@apache.org>
Subject Re: Review Request 54288: Make leader elections resilient to ZK disconnections.
Date Fri, 02 Dec 2016 22:17:13 GMT


> On Dec. 2, 2016, 7:58 a.m., Joshua Cohen wrote:
> > Thanks for picking this up! This is a basic question, but I just want to be sure:
by mimicking the old behavior, we're not running the risk of re-introducing the same deadlock
we were trying to fix by moving to Curator, right? I'm not sure where the deadlock was caused...
was it in our implementation of the `SingletonService` recipe, was it in the ZK client itself,
or somewhere else entirely?

>From my understanding it was our implementation of the SingletonService code.


> On Dec. 2, 2016, 7:58 a.m., Joshua Cohen wrote:
> > src/test/java/org/apache/aurora/scheduler/discovery/CuratorSingletonServiceTest.java,
lines 215-218
> > <https://reviews.apache.org/r/54288/diff/1/?file=1574553#file1574553line215>
> >
> >     Should we have an escape hatch for the case where we never become leader in
this test (i.e. sleep for up to N seconds then `fail()`)?

Perhaps I could just add a timeout to the test? JUnit allows me to do that.


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54288/#review157759
-----------------------------------------------------------


On Dec. 1, 2016, 7:19 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54288/
> -----------------------------------------------------------
> 
> (Updated Dec. 1, 2016, 7:19 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, John Sirois, and Stephan Erb.
> 
> 
> Bugs: AURORA-1669
>     https://issues.apache.org/jira/browse/AURORA-1669
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> As documented in AURORA-1840 the Curator `LeaderLatch` recipe abdicates
> leadership if the ZK connection is lost or if there is a timeout. This is not
> compatible with the commons based implementation which would only abdicate
> leadership if the ZK session timeout occurred.
> 
> This replaces the `LeaderLatch` recipe with the `LeaderSelector` recipe with a
> custom listener that only loses leadership if a connection loss occurs.
> 
> 
> Diffs
> -----
> 
>   commons/src/main/java/org/apache/aurora/common/zookeeper/testing/ZooKeeperTestServer.java
50acaeba82e163f8f2970a264cbd889c9eb3b5ed 
>   src/main/java/org/apache/aurora/scheduler/discovery/CuratorSingletonService.java c378172c850aafe0a9381552b5067277b40dbfab

>   src/test/java/org/apache/aurora/scheduler/discovery/BaseCuratorDiscoveryTest.java a2b4125369d1f6c0a79bc4ac0fb3d2dab8a6c583

>   src/test/java/org/apache/aurora/scheduler/discovery/CuratorSingletonServiceTest.java
6ea49b0c690d288ff59d1d4798144bfa2d153d3a 
> 
> Diff: https://reviews.apache.org/r/54288/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message