geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Schuchardt <bschucha...@pivotal.io>
Subject Review Request 60106: GEODE-3052 Restarting 2 locators within 1s of each other causes potential locator split brain
Date Thu, 15 Jun 2017 00:00:00 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/60106/
-----------------------------------------------------------

Review request for geode and Hitesh Khamesra.


Bugs: GEODE-3052
    https://issues.apache.org/jira/browse/GEODE-3052


Repository: geode


Description
-------

There were four problems that new unit tests hit:
1. when recovering a view from disk we were treating it as a definitive (live) view.  I've
moved it to a new variable in GMSLocator and set its viewId to -1.  At the same time I set
the initial GMSJoinLeave SearchState.viewId to -100 so it will be overridden by the one returned
by the locator.  These changes allow GmsJoinLeave to know that the potential coordinator is
from a recovered view.
2. when trying to join with a recovered view GMSJoinLeave.join() was giving up after the second
ID in the view and becoming the coordinator.  It needs to keep trying until the list is exhausted,
and it shouldn't sleep between attempts.
3. GMSLocator wasn't returning registrants for use in findCoordinatorFromView().  This was
causing it to choose itself as the coordinator instead of using registrant sort order and
choosing a different registrant as the coordinator.
4. During concurrent startup GMSLocator didn't know when the decision was made to become coordinator.
 It is now notified of this decision and processRequest() uses this flag to have it override
anything in the registrants set or in the recovered view.


Diffs
-----

  geode-core/src/main/java/org/apache/geode/distributed/internal/membership/NetView.java 26b03276b0abbf6210a5602a8c551abe38edc261

  geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/GMSUtil.java
c6bef571134c6444a297cc8fe0bb0b7eb95f41f4 
  geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/interfaces/Locator.java
c5fdf45411581a36feca220e14a0551f3197d368 
  geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/locator/GMSLocator.java
93fa9dab4ec2c8e43fc41cfd3b8ad986f96cf00f 
  geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java
8abcc456e42ad00a558a93f87bd3ae74ce88d146 
  geode-core/src/test/java/org/apache/geode/distributed/LocatorDUnitTest.java 7ecca6146f6b7a542ae9864d7fabd48c9794ecac

  geode-core/src/test/java/org/apache/geode/distributed/LocatorUDPSecurityDUnitTest.java df1d8d1101a5f9d04c402922955a283353aa3b7c

  geode-core/src/test/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeaveTestHelper.java
19cee066a488198471ebf4093045853e36d5ba78 


Diff: https://reviews.apache.org/r/60106/diff/1/


Testing
-------

New unit tests, regression testing (under way), precheckin (under way)


Thanks,

Bruce Schuchardt


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message