geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Galen O'Sullivan <gosulli...@pivotal.io>
Subject Re: Review Request 59925: GEODE-3052 Restarting 2 locators within 1s of each other causes potential locator split brain
Date Thu, 08 Jun 2017 23:45:49 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59925/#review177422
-----------------------------------------------------------


Ship it!




This makes sense to me: we remove the locators if we can't connect to them.

I wonder what happens if the two locators can't talk to each other (at first, anyways) but
can talk to the rest of the cluster. I imagine this is handled by our view management and
as long as the cluster is otherwise healthy, it will be fine.

As an aside, I'm curious about weight and failure -- do we expire servers from the weighting
for split-brain detection after a while?

- Galen O'Sullivan


On June 8, 2017, 6:36 p.m., Bruce Schuchardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59925/
> -----------------------------------------------------------
> 
> (Updated June 8, 2017, 6:36 p.m.)
> 
> 
> Review request for geode, Alexander Murmann, Galen O'Sullivan, Hitesh Khamesra, Udo Kohlmeyer,
and Brian Rowe.
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> When restarting from a locatorView.dat file we should ignore any locator entries in the
view.  Recovery tries to get this state from other locators before resorting to using the
persisted view so there we know all of the locator entries in the view are invalid.  This
allows the locators to quickly move into the concurrent-startup algorithm and find each other.
> 
> I removed the Flaky categorization of the test that I modified to reproduce the problem.
 A subclass's use of the test was reported as a Flaky failure but I found that the ticket
was closed.
> 
> 
> Diffs
> -----
> 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/locator/GMSLocator.java
e3635f2d93aae212cbff2f2058b6dc728a04776a 
>   geode-core/src/test/java/org/apache/geode/distributed/LocatorDUnitTest.java 8ff9b67e13dd50499d861ff62ddae3fb8668dd28

>   geode-core/src/test/java/org/apache/geode/distributed/LocatorUDPSecurityDUnitTest.java
9d49d30abfb8acccd8a5547ba0ee3c7bcf9e7970 
> 
> 
> Diff: https://reviews.apache.org/r/59925/diff/1/
> 
> 
> Testing
> -------
> 
> The problem was easily reproduced using LocatorDUnitTest.testStartTwoLocators by repeating
the cycling of the locators.  It failed every time I ran it.
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message