geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitesh Khamesra <hkhame...@pivotal.io>
Subject Re: Review Request 60106: GEODE-3052 Restarting 2 locators within 1s of each other causes potential locator split brain
Date Wed, 21 Jun 2017 22:10:23 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60106/#review178573
-----------------------------------------------------------




geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/locator/GMSLocator.java
Lines 235 (patched)
<https://reviews.apache.org/r/60106/#comment252637>

    we are setting -1 viewid for recorded view while recovery. Thus do we need "usingRecoveredView"
flag here?


- Hitesh Khamesra


On June 19, 2017, 4:09 p.m., Bruce Schuchardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/60106/
> -----------------------------------------------------------
> 
> (Updated June 19, 2017, 4:09 p.m.)
> 
> 
> Review request for geode and Hitesh Khamesra.
> 
> 
> Bugs: GEODE-3052
>     https://issues.apache.org/jira/browse/GEODE-3052
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> There were four problems that new unit tests hit:
> 1. when recovering a view from disk we were treating it as a definitive (live) view.
 I've moved it to a new variable in GMSLocator and set its viewId to -1.  At the same time
I set the initial GMSJoinLeave SearchState.viewId to -100 so it will be overridden by the
one returned by the locator.  These changes allow GmsJoinLeave to know that the potential
coordinator is from a recovered view.
> 2. when trying to join with a recovered view GMSJoinLeave.join() was giving up after
the second ID in the view and becoming the coordinator.  It needs to keep trying until the
list is exhausted, and it shouldn't sleep between attempts.
> 3. GMSLocator wasn't returning registrants for use in findCoordinatorFromView().  This
was causing it to choose itself as the coordinator instead of using registrant sort order
and choosing a different registrant as the coordinator.
> 4. During concurrent startup GMSLocator didn't know when the decision was made to become
coordinator.  It is now notified of this decision and processRequest() uses this flag to have
it override anything in the registrants set or in the recovered view.
> 
> 
> Diffs
> -----
> 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/membership/NetView.java
26b03276b0abbf6210a5602a8c551abe38edc261 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/GMSUtil.java
c6bef571134c6444a297cc8fe0bb0b7eb95f41f4 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/interfaces/Locator.java
c5fdf45411581a36feca220e14a0551f3197d368 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/locator/FindCoordinatorResponse.java
edfaf625e6c652f46d9323c1116791f1c69fda59 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/locator/GMSLocator.java
93fa9dab4ec2c8e43fc41cfd3b8ad986f96cf00f 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java
8abcc456e42ad00a558a93f87bd3ae74ce88d146 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/messenger/GMSEncrypt.java
c7b1a26b47cf2c913d9de30d6934ad5b3ac49840 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/messenger/JGroupsMessenger.java
390824eb11a2e72a21d951539c2e03ed8025be82 
>   geode-core/src/test/java/org/apache/geode/distributed/LocatorDUnitTest.java 7ecca6146f6b7a542ae9864d7fabd48c9794ecac

>   geode-core/src/test/java/org/apache/geode/distributed/LocatorUDPSecurityDUnitTest.java
df1d8d1101a5f9d04c402922955a283353aa3b7c 
>   geode-core/src/test/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeaveTestHelper.java
19cee066a488198471ebf4093045853e36d5ba78 
>   geode-core/src/test/java/org/apache/geode/distributed/internal/membership/gms/messenger/GMSEncryptJUnitTest.java
7f64c670400464aa8e6a73405516bd6e891a006b 
>   geode-core/src/test/java/org/apache/geode/test/dunit/standalone/DUnitLauncher.java
b35270e2d97930cee68d8c54221a04c20dfb96de 
> 
> 
> Diff: https://reviews.apache.org/r/60106/diff/2/
> 
> 
> Testing
> -------
> 
> New unit tests, regression testing (under way), precheckin (under way)
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message