geode-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bschucha...@apache.org
Subject [2/2] incubator-geode git commit: GEODE-77 fixes for failing unit and integration tests
Date Thu, 24 Sep 2015 22:55:13 GMT
GEODE-77 fixes for failing unit and integration tests

  if the coordinator returned by locators doesn't work we now
  ask other members in the view returned by the locator

  leave & crash events now ack the view currently being installed
  so that it won't wait for the associated members

  views are always installed using the two-phase protocol.
  GemFire did this and some algorithms, such as rebalancing,
  are thrown off if departed members are in the view when
  a rebalance starts.

  views are always transmitted by the View Creator thread.
  GMSJoinLeave.becomeCoordinator() installs an initial view
  into the View Creator before starting it if there is a view
  to prepare and install.

  Rebalance operations targetting a new member were failing to
  achieve balance if an old ID for that member was still in the
  membership view.  We now detect those old IDs and remove them
  when the new member joins.

  ClientProxyMembershipID was not being deserialized properly
  after being transmitted from one member to another.

  HealthMonitor classes with Ping in their names are renamed to
  Check.  I've found that Ping confuses people - they
  sometimes think that gemfire is using network ping protocol

  HealthMonitor beSick/playDead were tightened up

  HealthMonitor was not recording activity from other members unless
  it was currently watching that member.  This caused a lot of
  unnecessary suspicion when switching from one member to another.

  HealthMonitor wasn't clear about who was raising suspicion and
  who was the target of the suspicion.

  FindCoordinatorRequest was using java serialization for rejected
  coordinator IDs.

  GMSMembershipManager's latestViewLock use wasn't quite the same
  as in the membership manager in GemFire 8.2 and caused a deadlock.

  fixing GEODE-360: PRTombstoneMessage should ignore
  ForceReattemptException

  LogFileParser wasn't recognizing timestamps for thread dumps, causing
  them to be mis-sorted when merging log files.


Project: http://git-wip-us.apache.org/repos/asf/incubator-geode/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-geode/commit/eab327f6
Tree: http://git-wip-us.apache.org/repos/asf/incubator-geode/tree/eab327f6
Diff: http://git-wip-us.apache.org/repos/asf/incubator-geode/diff/eab327f6

Branch: refs/heads/feature/GEODE-77
Commit: eab327f6a2b9bc2ca715464f61b4388ce304df2f
Parents: 99e50c1
Author: Bruce Schuchardt <bschuchardt@pivotal.io>
Authored: Thu Sep 24 15:54:30 2015 -0700
Committer: Bruce Schuchardt <bschuchardt@pivotal.io>
Committed: Thu Sep 24 15:54:30 2015 -0700

----------------------------------------------------------------------
 .../internal/DistributionManager.java           |   5 +-
 .../internal/InternalDistributedSystem.java     |   2 -
 .../distributed/internal/InternalLocator.java   |   4 -
 .../internal/membership/MembershipManager.java  |   3 +-
 .../internal/membership/NetView.java            |   4 +-
 .../internal/membership/gms/Services.java       |   2 +-
 .../membership/gms/fd/GMSHealthMonitor.java     | 149 +++--
 .../gms/locator/FindCoordinatorRequest.java     |  24 +-
 .../gms/locator/FindCoordinatorResponse.java    |  71 ++-
 .../membership/gms/locator/GMSLocator.java      |  41 +-
 .../membership/gms/membership/GMSJoinLeave.java | 584 ++++++++++++++-----
 .../gms/messages/CheckRequestMessage.java       |  64 ++
 .../gms/messages/CheckResponseMessage.java      |  54 ++
 .../gms/messages/InstallViewMessage.java        |   3 +-
 .../gms/messages/LeaveRequestMessage.java       |   3 +-
 .../gms/messages/PingRequestMessage.java        |  64 --
 .../gms/messages/PingResponseMessage.java       |  54 --
 .../gms/messages/RemoveMemberMessage.java       |   3 +-
 .../gms/messenger/JGroupsMessenger.java         |   2 +
 .../gms/mgr/GMSMembershipManager.java           |  98 ++--
 .../gemstone/gemfire/internal/DSFIDFactory.java |   8 +-
 .../internal/DataSerializableFixedID.java       |   4 +-
 .../cache/partitioned/PRTombstoneMessage.java   |   3 +-
 .../cache/tier/sockets/CacheClientNotifier.java |   8 +-
 .../tier/sockets/ClientProxyMembershipID.java   |   9 +
 .../cache/tier/sockets/ServerConnection.java    |   2 +-
 .../gemfire/internal/logging/LogFileParser.java |   8 +
 ...ckOverflowRegionCCECompressionDUnitTest.java |  61 --
 ...PersistentRegionCCECompressionDUnitTest.java |  62 --
 ...tributedNoAckRegionCompressionDUnitTest.java |  62 --
 .../GlobalRegionCompressionDUnitTest.java       |  61 --
 .../gemfire/distributed/LocatorDUnitTest.java   |  42 +-
 .../membership/MembershipJUnitTest.java         |  20 +-
 .../membership/gms/MembershipManagerHelper.java |   9 +-
 .../membership/GMSHealthMonitorJUnitTest.java   |  44 +-
 .../gms/membership/GMSJoinLeaveJUnitTest.java   |  42 ++
 .../PRFunctionExecutionTimeOutDUnitTest.java    |   1 +
 .../java/dunit/standalone/DUnitLauncher.java    |   4 +-
 38 files changed, 969 insertions(+), 715 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/eab327f6/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/DistributionManager.java
----------------------------------------------------------------------
diff --git a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/DistributionManager.java b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/DistributionManager.java
index fdd8092..33d5b80 100644
--- a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/DistributionManager.java
+++ b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/DistributionManager.java
@@ -2844,7 +2844,8 @@ public class DistributionManager
     if (ch != null) {
       MembershipManager mgr = ch.getMembershipManager();
       if (mgr != null) {
-        synchronized (mgr.getViewLock()) {
+        mgr.getViewLock().writeLock().lock();
+        try {
           synchronized (this.membersLock) {
             // Don't let the members come and go while we are adding this
             // listener.  This ensures that the listener (probably a
@@ -2852,6 +2853,8 @@ public class DistributionManager
             addAllMembershipListener(l);
             return getDistributionManagerIdsIncludingAdmin();
           }
+        } finally {
+          mgr.getViewLock().writeLock().unlock();
         }
       }
     }

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/eab327f6/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/InternalDistributedSystem.java
----------------------------------------------------------------------
diff --git a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/InternalDistributedSystem.java b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/InternalDistributedSystem.java
index a14a332..c3929c0 100644
--- a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/InternalDistributedSystem.java
+++ b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/InternalDistributedSystem.java
@@ -743,7 +743,6 @@ public final class InternalDistributedSystem
           } finally {
             if (!startedPeerLocation) {
               this.startedLocator.stop();
-              this.startedLocator = null;
             }
           }
         }
@@ -776,7 +775,6 @@ public final class InternalDistributedSystem
       } finally {
         if (!finished) {
           this.startedLocator.stop();
-          this.startedLocator = null;
         }
       }
     }

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/eab327f6/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/InternalLocator.java
----------------------------------------------------------------------
diff --git a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/InternalLocator.java b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/InternalLocator.java
index 2158414..6ea54e2 100644
--- a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/InternalLocator.java
+++ b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/InternalLocator.java
@@ -725,10 +725,6 @@ public class InternalLocator extends Locator implements ConnectListener {
       startCache(existing);
     }
     else {
-      if (System.getProperty("p2p.joinTimeout", "").length() == 0) {
-          System.setProperty("p2p.joinTimeout", "5000");
-        }
-
       String thisLocator;
       {
         StringBuilder sb = new StringBuilder(100);

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/eab327f6/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/MembershipManager.java
----------------------------------------------------------------------
diff --git a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/MembershipManager.java b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/MembershipManager.java
index 85203e2..6965a44 100755
--- a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/MembershipManager.java
+++ b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/MembershipManager.java
@@ -11,6 +11,7 @@ import java.io.NotSerializableException;
 import java.util.Map;
 import java.util.Set;
 import java.util.concurrent.TimeoutException;
+import java.util.concurrent.locks.ReadWriteLock;
 
 import com.gemstone.gemfire.SystemFailure;
 import com.gemstone.gemfire.distributed.DistributedMember;
@@ -48,7 +49,7 @@ public interface MembershipManager {
    * While this lock is held the view can't change.
    * @since 5.7
    */
-  public Object getViewLock();
+  public ReadWriteLock getViewLock();
 
   /**
    * Return a {@link InternalDistributedMember} representing the current system

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/eab327f6/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/NetView.java
----------------------------------------------------------------------
diff --git a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/NetView.java b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/NetView.java
index b2867db..65fe913 100644
--- a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/NetView.java
+++ b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/NetView.java
@@ -44,7 +44,7 @@ public class NetView implements DataSerializableFixedID {
   private Set<InternalDistributedMember> crashedMembers;
   private InternalDistributedMember creator;
   private Set<InternalDistributedMember> hashedMembers;
-  static final private Random rd = new Random();
+  static public final Random RANDOM = new Random();
 
   public NetView() {
     viewId = 0;
@@ -242,7 +242,7 @@ public class NetView implements DataSerializableFixedID {
       results.add(localAddress);// to add local address
 
       if (notPreferredCoordinatorList.size() > 0) {
-        int idx = rd.nextInt(notPreferredCoordinatorList.size());
+        int idx = RANDOM.nextInt(notPreferredCoordinatorList.size());
         results.add(notPreferredCoordinatorList.get(idx)); // to add non preferred local address
       }
     }

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/eab327f6/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/Services.java
----------------------------------------------------------------------
diff --git a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/Services.java b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/Services.java
index c364b4d..acd2bed 100755
--- a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/Services.java
+++ b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/Services.java
@@ -151,7 +151,7 @@ public class Services {
     this.joinLeave.started();
     this.healthMon.started();
     this.manager.started();
-    logger.info("Attempting to join the distributed system");
+    logger.debug("All membership services have been started");
     try {
       this.manager.joinDistributedSystem();
     } catch (Throwable e) {

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/eab327f6/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/fd/GMSHealthMonitor.java
----------------------------------------------------------------------
diff --git a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/fd/GMSHealthMonitor.java b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/fd/GMSHealthMonitor.java
index ae867d1..19ef50f 100755
--- a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/fd/GMSHealthMonitor.java
+++ b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/fd/GMSHealthMonitor.java
@@ -1,7 +1,7 @@
 package com.gemstone.gemfire.distributed.internal.membership.gms.fd;
 
-import static com.gemstone.gemfire.internal.DataSerializableFixedID.PING_REQUEST;
-import static com.gemstone.gemfire.internal.DataSerializableFixedID.PING_RESPONSE;
+import static com.gemstone.gemfire.internal.DataSerializableFixedID.CHECK_REQUEST;
+import static com.gemstone.gemfire.internal.DataSerializableFixedID.CHECK_RESPONSE;
 import static com.gemstone.gemfire.internal.DataSerializableFixedID.SUSPECT_MEMBERS_MESSAGE;
 
 import java.util.ArrayList;
@@ -31,8 +31,8 @@ import com.gemstone.gemfire.distributed.internal.membership.NetView;
 import com.gemstone.gemfire.distributed.internal.membership.gms.Services;
 import com.gemstone.gemfire.distributed.internal.membership.gms.interfaces.HealthMonitor;
 import com.gemstone.gemfire.distributed.internal.membership.gms.interfaces.MessageHandler;
-import com.gemstone.gemfire.distributed.internal.membership.gms.messages.PingRequestMessage;
-import com.gemstone.gemfire.distributed.internal.membership.gms.messages.PingResponseMessage;
+import com.gemstone.gemfire.distributed.internal.membership.gms.messages.CheckRequestMessage;
+import com.gemstone.gemfire.distributed.internal.membership.gms.messages.CheckResponseMessage;
 import com.gemstone.gemfire.distributed.internal.membership.gms.messages.SuspectMembersMessage;
 import com.gemstone.gemfire.distributed.internal.membership.gms.messages.SuspectRequest;
 
@@ -61,7 +61,7 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
 
   private Services services;
   volatile private NetView currentView;
-  volatile private InternalDistributedMember nextNeighbour;
+  volatile private InternalDistributedMember nextNeighbor;
 
   long memberTimeout;
   volatile private boolean isStopping = false;
@@ -90,7 +90,7 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
 
   private ScheduledExecutorService scheduler;
 
-  private ExecutorService pingExecutor;
+  private ExecutorService checkExecutor;
 
   List<SuspectRequest> suspectRequests = new ArrayList<SuspectRequest>();
   private RequestCollector<SuspectRequest> suspectRequestCollectorThread;
@@ -121,6 +121,10 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
     CustomTimeStamp cTS = memberVsLastMsgTS.get(sender);
     if (cTS != null) {
       cTS.setTimeStamp(currentTimeStamp);
+    } else {
+      cTS = new CustomTimeStamp();
+      cTS.setTimeStamp(currentTimeStamp);
+      memberVsLastMsgTS.put(sender, cTS);
     }
   }
 
@@ -157,7 +161,7 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
     @Override
     public void run() {
 
-      InternalDistributedMember neighbour = nextNeighbour;
+      InternalDistributedMember neighbour = nextNeighbor;
       if (GMSHealthMonitor.this.isStopping) {
         return;
       }
@@ -166,12 +170,12 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
       GMSHealthMonitor.this.currentTimeStamp = currentTime;
 
       if (neighbour != null) {
-        CustomTimeStamp nextNeighbourTS;
+        CustomTimeStamp nextNeighborTS;
         synchronized(GMSHealthMonitor.this) {
-          nextNeighbourTS = GMSHealthMonitor.this.memberVsLastMsgTS.get(neighbour);
+          nextNeighborTS = GMSHealthMonitor.this.memberVsLastMsgTS.get(neighbour);
         }
 
-        if (nextNeighbourTS == null) {
+        if (nextNeighborTS == null) {
           CustomTimeStamp customTS = new CustomTimeStamp();
           customTS.setTimeStamp(currentTime);
           memberVsLastMsgTS.put(neighbour, customTS);
@@ -179,7 +183,7 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
         }
         
         long interval = memberTimeoutInMillis / GMSHealthMonitor.LOGICAL_INTERVAL;
-        long lastTS = currentTime - nextNeighbourTS.getTimeStamp();
+        long lastTS = currentTime - nextNeighborTS.getTimeStamp();
         if (lastTS + interval >= memberTimeoutInMillis) {
           logger.trace("Checking member {} ", neighbour);
           // now do check request for this member;
@@ -207,33 +211,33 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
 
   }
 
-  private PingRequestMessage constructPingRequestMessage(final InternalDistributedMember pingMember) {
+  private CheckRequestMessage constructCheckRequestMessage(final InternalDistributedMember mbr) {
     final int reqId = requestId.getAndIncrement();
-    final PingRequestMessage prm = new PingRequestMessage(pingMember, reqId);
-    prm.setRecipient(pingMember);
+    final CheckRequestMessage prm = new CheckRequestMessage(mbr, reqId);
+    prm.setRecipient(mbr);
 
     return prm;
   }
 
-  private void checkMember(final InternalDistributedMember pingMember) {
+  private void checkMember(final InternalDistributedMember mbr) {
     final NetView cv = GMSHealthMonitor.this.currentView;
 
-    // as ping may take time
-    setNextNeighbour(cv, pingMember);
+    // as check may take time
+    setNextNeighbor(cv, mbr);
 
-    // we need to ping this member
-    pingExecutor.execute(new Runnable() {
+    // we need to check this member
+    checkExecutor.execute(new Runnable() {
 
       @Override
       public void run() {
-        boolean pinged = GMSHealthMonitor.this.doCheckMember(pingMember);
+        boolean pinged = GMSHealthMonitor.this.doCheckMember(mbr);
         if (!pinged) {
-          String reason = String.format("Member isn't responding to check message: %s", pingMember);
-          GMSHealthMonitor.this.sendSuspectMessage(pingMember, reason);
+          String reason = String.format("Member isn't responding to check message: %s", mbr);
+          GMSHealthMonitor.this.sendSuspectMessage(mbr, reason);
         } else {
-          logger.trace("Setting next neighbour as member {} has not responded.", pingMember);
+          logger.trace("Setting next neighbour as member {} has not responded.", mbr);
           // back to previous one
-          setNextNeighbour(GMSHealthMonitor.this.currentView, null);
+          setNextNeighbor(GMSHealthMonitor.this.currentView, null);
         }
       }
     });
@@ -241,6 +245,10 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
   }
 
   private void sendSuspectMessage(InternalDistributedMember mbr, String reason) {
+    if (beingSick || playingDead) {
+      logger.debug("sick member is not sending suspect message concerning {}", mbr);
+      return;
+    }
     logger.debug("Suspecting {} reason=\"{}\"", mbr, reason);
     SuspectRequest sr = new SuspectRequest(mbr, reason);
     List<SuspectRequest> sl = new ArrayList<SuspectRequest>();
@@ -257,7 +265,7 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
   private boolean doCheckMember(InternalDistributedMember pingMember) {
     //TODO: need to some tcp check
     logger.trace("Checking member {}", pingMember);
-    final PingRequestMessage prm = constructPingRequestMessage(pingMember);
+    final CheckRequestMessage prm = constructCheckRequestMessage(pingMember);
     final Response pingResp = new Response();
     requestIdVsResponse.put(prm.getRequestId(), pingResp);
     try {
@@ -335,7 +343,7 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
       });
     }
     {
-      pingExecutor = Executors.newCachedThreadPool(new ThreadFactory() {
+      checkExecutor = Executors.newCachedThreadPool(new ThreadFactory() {
         AtomicInteger threadIdx = new AtomicInteger();
 
         @Override
@@ -371,7 +379,7 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
     synchronized (viewVsSuspectedMembers) {
       viewVsSuspectedMembers.clear();
     }
-    setNextNeighbour(newView, null);
+    setNextNeighbor(newView, null);
     currentView = newView;
   }
 
@@ -384,7 +392,7 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
    * It becomes null when we suspect current neighbour, during that time it watches
    * member next to suspect member.
    */
-  private synchronized void setNextNeighbour(NetView newView, InternalDistributedMember nextTo) {
+  private synchronized void setNextNeighbor(NetView newView, InternalDistributedMember nextTo) {
     if (nextTo == null) {
       nextTo = services.getJoinLeave().getMemberID();
     }
@@ -399,9 +407,9 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
     List<InternalDistributedMember> allMembers = newView.getMembers();
     int index = allMembers.indexOf(nextTo);
     if (index != -1) {
-      int nextNeighbourIndex = (index + 1) % allMembers.size();
-      nextNeighbour = allMembers.get(nextNeighbourIndex);
-      logger.trace("Next neighbour to check is {}", nextNeighbour);
+      int nextNeighborIndex = (index + 1) % allMembers.size();
+      nextNeighbor = allMembers.get(nextNeighborIndex);
+      logger.trace("Next neighbour to check is {}", nextNeighbor);
     }
 
     if (!sameView || memberVsLastMsgTS.size() == 0) {
@@ -420,16 +428,16 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
   }
 
   /*** test method */
-  public InternalDistributedMember getNextNeighbour() {
-    return nextNeighbour;
+  public InternalDistributedMember getNextNeighbor() {
+    return nextNeighbor;
   }
 
   @Override
   public void init(Services s) {
     services = s;
     memberTimeout = s.getConfig().getMemberTimeout();
-    services.getMessenger().addHandler(PingRequestMessage.class, this);
-    services.getMessenger().addHandler(PingResponseMessage.class, this);
+    services.getMessenger().addHandler(CheckRequestMessage.class, this);
+    services.getMessenger().addHandler(CheckResponseMessage.class, this);
     services.getMessenger().addHandler(SuspectMembersMessage.class, this);
   }
 
@@ -459,7 +467,7 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
         }
       }
 
-      pingExecutor.shutdown();
+      checkExecutor.shutdown();
     }
     {
       suspectRequestCollectorThread.shutdown();
@@ -470,7 +478,7 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
    * test method
    */
   public boolean isShutdown() {
-    return scheduler.isShutdown() && pingExecutor.isShutdown() && !suspectRequestCollectorThread.isAlive();
+    return scheduler.isShutdown() && checkExecutor.isShutdown() && !suspectRequestCollectorThread.isAlive();
   }
 
   @Override
@@ -505,33 +513,45 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
       return;
     }
 
-    logger.debug("HealthMonitor processing {}", m);
+    logger.trace("HealthMonitor processing {}", m);
 
     switch (m.getDSFID()) {
-    case PING_REQUEST:
-      processPingRequest((PingRequestMessage) m);
+    case CHECK_REQUEST:
+      if (beingSick || playingDead) {
+        logger.debug("sick member is ignoring check request");
+      } else {
+        processCheckRequest((CheckRequestMessage) m);
+      }
       break;
-    case PING_RESPONSE:
-      processPingResponse((PingResponseMessage) m);
+    case CHECK_RESPONSE:
+      if (beingSick || playingDead) {
+        logger.debug("sick member is ignoring check response");
+      } else {
+        processCheckResponse((CheckResponseMessage) m);
+      }
       break;
     case SUSPECT_MEMBERS_MESSAGE:
-      processSuspectMembersRequest((SuspectMembersMessage) m);
+      if (beingSick || playingDead) {
+        logger.debug("sick member is ignoring suspect message");
+      } else {
+        processSuspectMembersRequest((SuspectMembersMessage) m);
+      }
       break;
     default:
       throw new IllegalArgumentException("unknown message type: " + m);
     }
   }
 
-  private void processPingRequest(PingRequestMessage m) {
+  private void processCheckRequest(CheckRequestMessage m) {
     
-    if (beingSick || playingDead) {
+    if (this.isStopping || this.playingDead) {
       return;
     }
     
     // only respond if the intended recipient is this member
     InternalDistributedMember me = services.getMessenger().getMemberID();
     if (me.getVmViewId() < 0 || m.getTarget().equals(me)) {
-      PingResponseMessage prm = new PingResponseMessage(m.getRequestId());
+      CheckResponseMessage prm = new CheckResponseMessage(m.getRequestId());
       prm.setRecipient(m.getSender());
       Set<InternalDistributedMember> membersNotReceivedMsg = services.getMessenger().send(prm);
       // TODO: send is throwing exception right now
@@ -539,17 +559,17 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
         logger.debug("Unable to send check response to member: {}", m.getSender());
       }
     } else {
-      logger.debug("Ignoring ping request intended for {}.  My ID is {}", m.getTarget(), me);
+      logger.debug("Ignoring check request intended for {}.  My ID is {}", m.getTarget(), me);
     }
   }
 
-  private void processPingResponse(PingResponseMessage m) {
-    Response pingResp = requestIdVsResponse.get(m.getRequestId());
-    logger.debug("Got check response from member {}. {}", m.getSender(), (pingResp != null ? "Check Thread still waiting" : "Check thread is not waiting"));
-    if (pingResp != null) {
-      synchronized (pingResp) {
-        pingResp.setResponseMsg(m);
-        pingResp.notify();
+  private void processCheckResponse(CheckResponseMessage m) {
+    Response resp = requestIdVsResponse.get(m.getRequestId());
+    logger.trace("Got check response from member {}. {}", m.getSender(), (resp != null ? "Check Thread still waiting" : "Check thread is not waiting"));
+    if (resp != null) {
+      synchronized (resp) {
+        resp.setResponseMsg(m);
+        resp.notify();
       }
     }
   }
@@ -578,7 +598,13 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
 
     InternalDistributedMember localAddress = services.getJoinLeave().getMemberID();
 
+    InternalDistributedMember sender = incomingRequest.getSender();
+
     if (cv.getCoordinator().equals(localAddress)) {
+      for (SuspectRequest req: incomingRequest.getMembers()) {
+        logger.info("received suspect message from {}: {}",
+           sender, req.getReason());
+      }
       doFinalCheck(sMembers, cv, localAddress);
     }// coordinator ends
     else {
@@ -599,6 +625,10 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
       InternalDistributedMember coordinator = check.getCoordinator();
       if (coordinator != null && coordinator.equals(localAddress)) {
         // new coordinator
+        for (SuspectRequest req: incomingRequest.getMembers()) {
+          logger.info("received suspect message from {} for {}: {}",
+             sender, req.getSuspectMember(), req.getReason());
+        }
         doFinalCheck(smbr, cv, localAddress);
       } else {
         recordSuspectRequests(sMembers, cv);
@@ -649,9 +679,9 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
 
       if (view == null || !view.equals(cv)) {
         final String reason = sr.getReason();
-        logger.debug("Doing final check for member {}", mbr);
+        logger.debug("Doing final check for member {}; reason={}", mbr, reason);
         // its a coordinator
-        pingExecutor.execute(new Runnable() {
+        checkExecutor.execute(new Runnable() {
 
           @Override
           public void run() {
@@ -754,8 +784,9 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
             }
           } // synchronized
           if (requests != null && !requests.isEmpty()) {
-            if (logger != null && logger.isDebugEnabled())
+            if (logger != null && logger.isDebugEnabled()) {
               logger.debug("Health Monitor is sending {} member suspect requests to coordinator", requests.size());
+            }
             callback.process(requests);
             requests = null;
           }
@@ -768,6 +799,10 @@ public class GMSHealthMonitor implements HealthMonitor, MessageHandler {
   }
 
   private void sendSuspectRequest(final List<SuspectRequest> requests) {
+    if (beingSick || playingDead) {
+      logger.debug("sick member is not sending suspect request");
+      return;
+    }
     logger.debug("Sending suspect request for members {}", requests);
     synchronized (suspectRequests) {
       if (suspectRequests.size() > 0) {

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/eab327f6/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/locator/FindCoordinatorRequest.java
----------------------------------------------------------------------
diff --git a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/locator/FindCoordinatorRequest.java b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/locator/FindCoordinatorRequest.java
index da79b03..b1d7412 100755
--- a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/locator/FindCoordinatorRequest.java
+++ b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/locator/FindCoordinatorRequest.java
@@ -3,6 +3,7 @@ package com.gemstone.gemfire.distributed.internal.membership.gms.locator;
 import java.io.DataInput;
 import java.io.DataOutput;
 import java.io.IOException;
+import java.util.ArrayList;
 import java.util.Collection;
 
 import com.gemstone.gemfire.DataSerializer;
@@ -12,7 +13,8 @@ import com.gemstone.gemfire.distributed.internal.membership.InternalDistributedM
 import com.gemstone.gemfire.internal.DataSerializableFixedID;
 import com.gemstone.gemfire.internal.Version;
 
-public class FindCoordinatorRequest implements DataSerializableFixedID, PeerLocatorRequest {
+public class FindCoordinatorRequest extends HighPriorityDistributionMessage
+  implements PeerLocatorRequest {
 
   private InternalDistributedMember memberID;
   private Collection<InternalDistributedMember> rejectedCoordinators;
@@ -67,15 +69,31 @@ public class FindCoordinatorRequest implements DataSerializableFixedID, PeerLoca
   @Override
   public void toData(DataOutput out) throws IOException {
     DataSerializer.writeObject(this.memberID, out);
-    DataSerializer.writeObject(this.rejectedCoordinators, out);
+    if (this.rejectedCoordinators != null) {
+      out.writeInt(this.rejectedCoordinators.size());
+      for (InternalDistributedMember mbr: this.rejectedCoordinators) {
+        DataSerializer.writeObject(mbr, out);
+      }
+    } else {
+      out.writeInt(0);
+    }
     out.writeInt(lastViewId);
   }
 
   @Override
   public void fromData(DataInput in) throws IOException, ClassNotFoundException {
     this.memberID = DataSerializer.readObject(in);
-    this.rejectedCoordinators = DataSerializer.readObject(in);
+    int size = in.readInt();
+    this.rejectedCoordinators = new ArrayList<InternalDistributedMember>(size);
+    for (int i=0; i<size; i++) {
+      this.rejectedCoordinators.add((InternalDistributedMember)DataSerializer.readObject(in));
+    }
     this.lastViewId = in.readInt();
   }
 
+  @Override
+  protected void process(DistributionManager dm) {
+    throw new IllegalStateException("this message should not be executed");
+  }
+
 }

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/eab327f6/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/locator/FindCoordinatorResponse.java
----------------------------------------------------------------------
diff --git a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/locator/FindCoordinatorResponse.java b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/locator/FindCoordinatorResponse.java
index e475796..7177d04 100755
--- a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/locator/FindCoordinatorResponse.java
+++ b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/locator/FindCoordinatorResponse.java
@@ -3,29 +3,50 @@ package com.gemstone.gemfire.distributed.internal.membership.gms.locator;
 import java.io.DataInput;
 import java.io.DataOutput;
 import java.io.IOException;
+import java.util.HashSet;
+import java.util.Set;
 
 import com.gemstone.gemfire.DataSerializer;
+import com.gemstone.gemfire.distributed.internal.DistributionManager;
+import com.gemstone.gemfire.distributed.internal.HighPriorityDistributionMessage;
 import com.gemstone.gemfire.distributed.internal.membership.InternalDistributedMember;
+import com.gemstone.gemfire.distributed.internal.membership.NetView;
 import com.gemstone.gemfire.internal.DataSerializableFixedID;
+import com.gemstone.gemfire.internal.InternalDataSerializer;
 import com.gemstone.gemfire.internal.Version;
 
-public class FindCoordinatorResponse  implements DataSerializableFixedID {
+public class FindCoordinatorResponse  extends HighPriorityDistributionMessage
+    implements DataSerializableFixedID {
 
   private InternalDistributedMember coordinator;
+  private InternalDistributedMember senderId;
   private boolean fromView;
-  private int viewId;
+  private NetView view;
+  private Set<InternalDistributedMember> registrants;
   private boolean networkPartitionDetectionEnabled;
   private boolean usePreferredCoordinators;
+  private boolean isShortForm;
   
   
   public FindCoordinatorResponse(InternalDistributedMember coordinator,
-      boolean fromView, int viewId,
+      InternalDistributedMember senderId,
+      boolean fromView, NetView view, HashSet<InternalDistributedMember> registrants,
       boolean networkPartitionDectionEnabled, boolean usePreferredCoordinators) {
     this.coordinator = coordinator;
+    this.senderId = senderId;
     this.fromView = fromView;
-    this.viewId = viewId;
+    this.view = view;
+    this.registrants = registrants;
     this.networkPartitionDetectionEnabled = networkPartitionDectionEnabled;
     this.usePreferredCoordinators = usePreferredCoordinators;
+    this.isShortForm = false;
+  }
+  
+  public FindCoordinatorResponse(InternalDistributedMember coordinator,
+      InternalDistributedMember senderId) {
+    this.coordinator = coordinator;
+    this.senderId = senderId;
+    this.isShortForm = true;
   }
   
   public FindCoordinatorResponse() {
@@ -44,19 +65,32 @@ public class FindCoordinatorResponse  implements DataSerializableFixedID {
     return coordinator;
   }
   
+  public InternalDistributedMember getSenderId() {
+    return senderId;
+  }
+  
   public boolean isFromView() {
     return fromView;
   }
   
-  public int getViewId() {
-    return viewId;
+  public NetView getView() {
+    return view;
+  }
+  
+  public Set<InternalDistributedMember> getRegistrants() {
+    return registrants;
   }
   
   @Override
   public String toString() {
-    return "FindCoordinatorResponse(coordinator="+coordinator+", fromView="+fromView+", viewId="+viewId
+    if (this.isShortForm) { 
+      return "FindCoordinatorResponse(coordinator="+coordinator+")";
+    } else {
+      return "FindCoordinatorResponse(coordinator="+coordinator+", fromView="+fromView+", viewId="+view.getViewId()
+        +", registrants=" + registrants.size()
         +", network partition detection enabled="+this.networkPartitionDetectionEnabled
         +", locators preferred as coordinators="+this.usePreferredCoordinators+")";
+    }
   }
 
 
@@ -74,19 +108,32 @@ public class FindCoordinatorResponse  implements DataSerializableFixedID {
   @Override
   public void toData(DataOutput out) throws IOException {
     DataSerializer.writeObject(coordinator, out);
-    out.writeInt(viewId);
+    DataSerializer.writeObject(senderId, out);
+    out.writeBoolean(isShortForm);
     out.writeBoolean(fromView);
     out.writeBoolean(networkPartitionDetectionEnabled);
     out.writeBoolean(usePreferredCoordinators);
+    DataSerializer.writeObject(view, out);
+    InternalDataSerializer.writeSet(registrants, out);
   }
 
   @Override
   public void fromData(DataInput in) throws IOException, ClassNotFoundException {
     coordinator = DataSerializer.readObject(in);
-    viewId = in.readInt();
-    fromView = in.readBoolean();
-    networkPartitionDetectionEnabled = in.readBoolean();
-    usePreferredCoordinators = in.readBoolean();
+    senderId = DataSerializer.readObject(in);
+    isShortForm = in.readBoolean();
+    if (!isShortForm) {
+      fromView = in.readBoolean();
+      networkPartitionDetectionEnabled = in.readBoolean();
+      usePreferredCoordinators = in.readBoolean();
+      view = DataSerializer.readObject(in);
+      registrants = InternalDataSerializer.readHashSet(in);
+    }
+  }
+
+  @Override
+  protected void process(DistributionManager dm) {
+    throw new IllegalStateException("this message should not be executed");
   }
 
 }

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/eab327f6/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/locator/GMSLocator.java
----------------------------------------------------------------------
diff --git a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/locator/GMSLocator.java b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/locator/GMSLocator.java
index e0ee678..407ed9b 100755
--- a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/locator/GMSLocator.java
+++ b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/locator/GMSLocator.java
@@ -54,6 +54,7 @@ public class GMSLocator implements Locator, NetLocator {
   private final List<InetSocketAddress> locators;
   private Services services;
   private final LocatorStats stats;
+  private InternalDistributedMember localAddress;
   
   private Set<InternalDistributedMember> registrants = new HashSet<InternalDistributedMember>();
 
@@ -95,6 +96,7 @@ public class GMSLocator implements Locator, NetLocator {
     if (services == null || services.isStopped()) {
       logger.info("Peer locator is connecting to local membership services");
       services = ((GMSMembershipManager)mgr).getServices();
+      localAddress = services.getMessenger().getMemberID();
       services.setLocator(this);
       NetView newView = services.getJoinLeave().getView();
       if (newView != null) {
@@ -157,37 +159,32 @@ public class GMSLocator implements Locator, NetLocator {
         
         boolean fromView = false;
         int viewId = -1;
+        NetView v = this.view;
         
-        if (view != null) {
+        if (v != null) {
           // if the ID of the requester matches an entry in the membership view then remove
-          // that entry
+          // that entry - it's obviously an old member since the ID has been reused
           InternalDistributedMember rid = findRequest.getMemberID();
-          for (InternalDistributedMember id: view.getMembers()) {
+          for (InternalDistributedMember id: v.getMembers()) {
             if (rid.compareTo(id, false) == 0) {
-              NetView newView = new NetView(view, view.getViewId());
+              NetView newView = new NetView(v, v.getViewId());
               newView.remove(id);
-              this.view = newView;
+              v = newView;
               break;
             }
           }
-          viewId = view.getViewId();
+          viewId = v.getViewId();
           if (viewId > findRequest.getLastViewId()) {
             // ignore the requests rejectedCoordinators if the view has changed
-            coord = view.getCoordinator(Collections.<InternalDistributedMember>emptyList());
+            coord = v.getCoordinator(Collections.<InternalDistributedMember>emptyList());
           } else {
-            coord = view.getCoordinator(findRequest.getRejectedCoordinators());
+            coord = v.getCoordinator(findRequest.getRejectedCoordinators());
           }
           logger.debug("Peer locator: coordinator from view is {}", coord);
           fromView = true;
         }
         
-        if (coord != null) {
-          // no need to keep track of registrants after we're in the distributed system
-          synchronized(registrants) {
-            registrants.clear();
-          }
-          
-        } else {
+        if (coord == null) {
           // find the "oldest" registrant
           Collection<InternalDistributedMember> rejections = findRequest.getRejectedCoordinators();
           if (rejections == null) {
@@ -209,12 +206,16 @@ public class GMSLocator implements Locator, NetLocator {
             logger.debug("Peer locator: coordinator from registrations is {}", coord);
           }
         }
-        response = new FindCoordinatorResponse(coord, fromView, viewId,
-            this.networkPartitionDetectionEnabled, this.usePreferredCoordinators);
+        
+        synchronized(registrants) {
+          response = new FindCoordinatorResponse(coord, localAddress,
+              fromView, view, new HashSet<InternalDistributedMember>(registrants),
+              this.networkPartitionDetectionEnabled, this.usePreferredCoordinators);
+        }
       }
     }
     if (logger.isDebugEnabled()) {
-      logger.debug("Peer locator returning " + response);
+      logger.debug("Peer locator returning {}", response);
     }
     return response;
   }
@@ -288,8 +289,8 @@ public class GMSLocator implements Locator, NetLocator {
   
   private boolean recoverFromOthers() {
     for (InetSocketAddress other: this.locators) {
-      logger.info("Peer locator attempting to get state from " + other);
       if (recover(other)) {
+        logger.info("Peer locator recovered state from " + other);
         return true;
       }
     } // for
@@ -306,7 +307,7 @@ public class GMSLocator implements Locator, NetLocator {
         return true;
       }
     } catch (IOException | ClassNotFoundException ignore) {
-      logger.info("Peer locator could not recover membership view from {}: {}", other, ignore.getMessage());
+      logger.debug("Peer locator could not recover membership view from {}: {}", other, ignore.getMessage());
     }
     return false;
   }

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/eab327f6/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java
----------------------------------------------------------------------
diff --git a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java
index 8c55298..7b6b97d 100755
--- a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java
+++ b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java
@@ -6,6 +6,8 @@ import static com.gemstone.gemfire.internal.DataSerializableFixedID.JOIN_RESPONS
 import static com.gemstone.gemfire.internal.DataSerializableFixedID.LEAVE_REQUEST_MESSAGE;
 import static com.gemstone.gemfire.internal.DataSerializableFixedID.REMOVE_MEMBER_REQUEST;
 import static com.gemstone.gemfire.internal.DataSerializableFixedID.VIEW_ACK_MESSAGE;
+import static com.gemstone.gemfire.internal.DataSerializableFixedID.FIND_COORDINATOR_REQ;
+import static com.gemstone.gemfire.internal.DataSerializableFixedID.FIND_COORDINATOR_RESP;
 
 import java.io.IOException;
 import java.net.InetSocketAddress;
@@ -47,6 +49,7 @@ import com.gemstone.gemfire.distributed.internal.membership.gms.interfaces.JoinL
 import com.gemstone.gemfire.distributed.internal.membership.gms.interfaces.MessageHandler;
 import com.gemstone.gemfire.distributed.internal.membership.gms.locator.FindCoordinatorRequest;
 import com.gemstone.gemfire.distributed.internal.membership.gms.locator.FindCoordinatorResponse;
+import com.gemstone.gemfire.distributed.internal.membership.gms.messages.HasMemberID;
 import com.gemstone.gemfire.distributed.internal.membership.gms.messages.InstallViewMessage;
 import com.gemstone.gemfire.distributed.internal.membership.gms.messages.JoinRequestMessage;
 import com.gemstone.gemfire.distributed.internal.membership.gms.messages.JoinResponseMessage;
@@ -67,6 +70,9 @@ import com.gemstone.gemfire.security.AuthenticationFailedException;
 public class GMSJoinLeave implements JoinLeave, MessageHandler {
   
   public static String BYPASS_DISCOVERY = "gemfire.bypass-discovery";
+  
+  /** amount of time to wait for responses to FindCoordinatorRequests */
+  private static final int DISCOVERY_TIMEOUT = Integer.getInteger("gemfire.discovery-timeout", 3000);
 
   /** amount of time to sleep before trying to join after a failed attempt */
   private static final int JOIN_RETRY_SLEEP = Integer.getInteger("gemfire.join-retry-sleep", 1000);
@@ -77,6 +83,9 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
   /** time to wait for a leave request to be transmitted by jgroups */
   private static final long LEAVE_MESSAGE_SLEEP_TIME = Long.getLong("gemfire.leave-message-sleep-time", 1000);
   
+  /** if the locators don't know who the coordinator is we send find-coord requests to this many nodes */
+  private static final int MAX_DISCOVERY_NODES = Integer.getInteger("gemfire.max-discovery-nodes", 30);
+  
   /** membership logger */
   private static final Logger logger = Services.getLogger();
 
@@ -140,8 +149,32 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
   
   /** am I shutting down? */
   private volatile boolean isStopping;
+
+  /** state of collected artifacts during discovery */
+  final SearchState searchState = new SearchState();
   
+  /** a collection used to detect unit testing */
+  Set<String> unitTesting = new HashSet<>();
   
+  static class SearchState {
+    Set<InternalDistributedMember> alreadyTried = new HashSet<>();
+    Set<InternalDistributedMember> registrants = new HashSet<>();
+    InternalDistributedMember possibleCoordinator;
+    int viewId = -1;
+    boolean hasContactedALocator;
+    NetView view;
+    Set<FindCoordinatorResponse> responses = new HashSet<>();
+    
+    void cleanup() {
+      alreadyTried.clear();
+      possibleCoordinator = null;
+      view = null;
+      synchronized(responses) {
+        responses.clear();
+      }
+    }
+  }
+
   /**
    * attempt to join the distributed system
    * loop
@@ -161,28 +194,30 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
       return true;
     }
     
-    SearchState state = new SearchState();
+    SearchState state = searchState;
     
     long timeout = services.getConfig().getJoinTimeout();
+    logger.debug("join timeout is set to {}", timeout);
     long retrySleep =  JOIN_RETRY_SLEEP;
     long startTime = System.currentTimeMillis();
     long giveupTime = startTime + timeout;
 
     for (int tries=0; !this.isJoined; tries++) {
 
-      boolean found = findCoordinator(state);
+      boolean found = findCoordinator();
       if (found) {
         logger.debug("found possible coordinator {}", state.possibleCoordinator);
-        if (state.possibleCoordinator.equals(this.localAddress)) {
+        if (localAddress.getNetMember().preferredForCoordinator()
+            && state.possibleCoordinator.equals(this.localAddress)) {
           if (tries > 2 || System.currentTimeMillis() < giveupTime ) {
             becomeCoordinator();
             return true;
           }
         } else {
-          if (attemptToJoin(state)) {
+          if (attemptToJoin()) {
             return true;
           }
-          if (System.currentTimeMillis() < giveupTime) {
+          if (System.currentTimeMillis() > giveupTime) {
             break;
           }
           if (!state.possibleCoordinator.equals(localAddress)) {
@@ -190,10 +225,9 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
           }
         }
       } else {
-        if (System.currentTimeMillis() < giveupTime) {
+        if (System.currentTimeMillis() > giveupTime) {
           break;
         }
-        state.alreadyTried.clear();
       }
       try {
         Thread.sleep(retrySleep);
@@ -223,7 +257,9 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
    * @param coord
    * @return true if the attempt succeeded, false if it timed out
    */
-  private boolean attemptToJoin(SearchState state) {
+  private boolean attemptToJoin() {
+    SearchState state = searchState;
+    
     // send a join request to the coordinator and wait for a response
     InternalDistributedMember coord = state.possibleCoordinator;
     logger.info("Attempting to join the distributed system through coordinator " + coord + " using address " + this.localAddress);
@@ -375,6 +411,8 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
     else {
       if (!isStopping && !services.getCancelCriterion().isCancelInProgress()) {
         recordViewRequest(incomingRequest);
+        this.viewProcessor.processLeaveRequest(incomingRequest.getMemberID());
+        this.prepareProcessor.processLeaveRequest(incomingRequest.getMemberID());
       }
     }
   }
@@ -410,7 +448,12 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
     if (!isCoordinator && !isStopping && !services.getCancelCriterion().isCancelInProgress()) {
       logger.debug("JoinLeave is checking to see if I should become coordinator");
       NetView check = new NetView(v, v.getViewId()+1);
-      check.remove(mbr);
+      synchronized(removedMembers) {
+        removedMembers.add(mbr);
+        check = new NetView(v, v.getViewId());
+        check.addCrashedMembers(removedMembers);
+        check.removeAll(removedMembers);
+      }
       if (check.getCoordinator().equals(localAddress)) {
         becomeCoordinator(mbr);
       }
@@ -418,6 +461,8 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
     else {
       if (!isStopping && !services.getCancelCriterion().isCancelInProgress()) {
         recordViewRequest(incomingRequest);
+        this.viewProcessor.processRemoveRequest(mbr);
+        this.prepareProcessor.processRemoveRequest(mbr);
       }
     }
   }
@@ -454,6 +499,7 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
    * @param oldCoordinator may be null
    */
   private void becomeCoordinator(InternalDistributedMember oldCoordinator) {
+    boolean testing = unitTesting.contains("noRandomViewChange");
     stateLock.writeLock().lock();
     try {
       if (isCoordinator) {
@@ -467,27 +513,40 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
         this.localAddress.setVmViewId(0);
         installView(newView);
         isJoined = true;
-        startCoordinatorServices();
+        if (viewCreator == null || viewCreator.isShutdown()) {
+          viewCreator = new ViewCreator("GemFire Membership View Creator", Services.getThreadGroup());
+          viewCreator.setDaemon(true);
+          viewCreator.start();
+        }
       } else {
         // create and send out a new view
         NetView newView;
+        Set<InternalDistributedMember> leaving = new HashSet<>();
+        Set<InternalDistributedMember> removals;
         synchronized(viewInstallationLock) {
-          int viewNumber = currentView.getViewId() + 5;
+          int rand = testing? 0 : NetView.RANDOM.nextInt(10);
+          int viewNumber = currentView.getViewId() + 5 + rand;
           List<InternalDistributedMember> mbrs = new ArrayList<>(currentView.getMembers());
           if (!mbrs.contains(localAddress)) {
             mbrs.add(localAddress);
           }
-          Set<InternalDistributedMember> leaving = new HashSet<>();
-          if (oldCoordinator != null) {
-            leaving.add(oldCoordinator);
-          }
           synchronized(this.removedMembers) {
-            newView = new NetView(this.localAddress, viewNumber, mbrs, leaving,
-                this.removedMembers);
+            removals = new HashSet<>(this.removedMembers);
+          }
+          if (oldCoordinator != null && !removals.contains(oldCoordinator)) {
+            leaving.add(oldCoordinator);
           }
+          mbrs.removeAll(removals);
+          mbrs.removeAll(leaving);
+          newView = new NetView(this.localAddress, viewNumber, mbrs, leaving,
+              removals);
+        }
+        if (viewCreator == null || viewCreator.isShutdown()) {
+          viewCreator = new ViewCreator("GemFire Membership View Creator", Services.getThreadGroup());
+          viewCreator.setInitialView(newView, leaving, removals);
+          viewCreator.setDaemon(true);
+          viewCreator.start();
         }
-        sendView(newView, Collections.<InternalDistributedMember>emptyList());
-        startCoordinatorServices();
       }
     } finally {
       stateLock.writeLock().unlock();
@@ -526,7 +585,8 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
     InstallViewMessage msg = new InstallViewMessage(view, services.getAuthenticator().getCredentials(this.localAddress), preparing);
     Set<InternalDistributedMember> recips = new HashSet<>(view.getMembers());
 
-    recips.removeAll(newMembers); // new members get the view in a JoinResponseMessage
+    // a recent member was seen not to receive a new view - I think this is why
+//    recips.removeAll(newMembers); // new members get the view in a JoinResponseMessage
     recips.remove(this.localAddress); // no need to send it to ourselves
 
     Set<InternalDistributedMember> responders = recips;
@@ -548,19 +608,24 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
     }
     
     msg.setRecipients(recips);
+    
+    Set<InternalDistributedMember> pendingLeaves = getPendingRequestIDs(LEAVE_REQUEST_MESSAGE);
+    Set<InternalDistributedMember> pendingRemovals = getPendingRequestIDs(REMOVE_MEMBER_REQUEST);
     rp.initialize(id, responders);
+    rp.processPendingRequests(pendingLeaves, pendingRemovals);
     services.getMessenger().send(msg);
 
     // only wait for responses during preparation
     if (preparing) {
+logger.debug("waiting for view responses");
       Set<InternalDistributedMember> failedToRespond = rp.waitForResponses();
 
-      logger.info("View Creator is finished waiting for responses to view preparation");
+      logger.info("finished waiting for responses to view preparation");
       
       InternalDistributedMember conflictingViewSender = rp.getConflictingViewSender();
       NetView conflictingView = rp.getConflictingView();
       if (conflictingView != null) {
-        logger.warn("View Creator received a conflicting membership view from " + conflictingViewSender
+        logger.warn("received a conflicting membership view from " + conflictingViewSender
             + " during preparation: " + conflictingView);
         return false;
       }
@@ -637,9 +702,17 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
    * All locators are contacted.  If they don't agree then we choose the oldest
    * coordinator and return it.
    */
-  private boolean findCoordinator(SearchState state) {
+  private boolean findCoordinator() {
+    SearchState state = searchState;
+    
     assert this.localAddress != null;
     
+    // TODO - should we try more than one preferred coordinator
+    // before jumping to asking view-members who the coordinator is?
+    if ( !state.alreadyTried.isEmpty() && state.view != null) {
+      return findCoordinatorFromView();
+    }
+    
     FindCoordinatorRequest request = new FindCoordinatorRequest(this.localAddress, state.alreadyTried, state.viewId);
     Set<InternalDistributedMember> coordinators = new HashSet<InternalDistributedMember>();
     long giveUpTime = System.currentTimeMillis() + (services.getConfig().getLocatorWaitTime() * 1000L);
@@ -657,12 +730,18 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
           FindCoordinatorResponse response = (o instanceof FindCoordinatorResponse) ? (FindCoordinatorResponse)o : null;
           if (response != null && response.getCoordinator() != null) {
             anyResponses = false;
-            int viewId = response.getViewId();
+            NetView v = response.getView();
+            int viewId = v == null? -1 : v.getViewId();
             if (viewId > state.viewId) {
               // if the view has changed it is possible that a member
               // that we already tried to join with will become coordinator
               state.alreadyTried.clear();
               state.viewId = viewId;
+              state.view = v;
+              state.registrants.clear();
+              if (response.getRegistrants() != null) {
+                state.registrants.addAll(response.getRegistrants());
+              }
             }
             coordinators.add(response.getCoordinator());
             if (!flagsSet) {
@@ -680,6 +759,7 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
               services.getConfig().getDistributionConfig().setEnableNetworkPartitionDetection(enabled);
 
               if (response.isUsePreferredCoordinators()) {
+                this.quorumRequired = true;
                 logger.debug("The locator indicates that all locators should be preferred as coordinators");
                 if (services.getLocator() != null
                     || Locator.hasLocator()
@@ -687,6 +767,8 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
                     || localAddress.getVmKind() == DistributionManager.LOCATOR_DM_TYPE) {
                   ((GMSMember)localAddress.getNetMember()).setPreferredForCoordinator(true);
                 }
+              } else {
+                ((GMSMember)localAddress.getNetMember()).setPreferredForCoordinator(true);
               }
             }
           }
@@ -718,9 +800,92 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
       }
       state.possibleCoordinator = oldest;
     }
+    InternalDistributedMember coord = null;
+    boolean coordIsNoob = true;
+    for (; it.hasNext(); ) {
+      InternalDistributedMember mbr = it.next();
+      if (!state.alreadyTried.contains(mbr)) {
+        boolean mbrIsNoob = (mbr.getVmViewId() < 0);
+        if (mbrIsNoob) {
+          // member has not yet joined
+          if (coordIsNoob && (coord == null || coord.compareTo(mbr) > 0)) {
+            coord = mbr;
+          }
+        } else {
+          // member has already joined
+          if (coordIsNoob || mbr.getVmViewId() > coord.getVmViewId()) {
+            coord = mbr;
+            coordIsNoob = false;
+          }
+        }
+      }
+    }
     return true;
   }
   
+  boolean findCoordinatorFromView() {
+    ArrayList<FindCoordinatorResponse> result;
+    SearchState state = searchState;
+    NetView v = state.view;
+    List<InternalDistributedMember> recipients = new ArrayList(v.getMembers());
+
+    if (recipients.size() > MAX_DISCOVERY_NODES && MAX_DISCOVERY_NODES > 0) {
+      recipients = recipients.subList(0, MAX_DISCOVERY_NODES);
+    }
+    if (state.registrants != null) {
+      recipients.addAll(state.registrants);
+    }
+    recipients.remove(localAddress);
+    FindCoordinatorRequest req = new FindCoordinatorRequest(localAddress, state.alreadyTried, state.viewId);
+    req.setRecipients(v.getMembers());
+
+    boolean testing = unitTesting.contains("findCoordinatorFromView"); 
+    synchronized(state.responses) {
+      if (!testing) {
+        state.responses.clear();
+      }
+      services.getMessenger().send(req);
+      try {
+        if (!testing) {
+          state.responses.wait(DISCOVERY_TIMEOUT);
+        }
+      } catch (InterruptedException e) {
+        Thread.currentThread().interrupt();
+        return false;
+      }
+      result = new ArrayList<>(state.responses);
+      state.responses.clear();
+    }
+    
+    InternalDistributedMember coord = null;
+    if (localAddress.getNetMember().preferredForCoordinator()) {
+      // it's possible that all other potential coordinators are gone
+      // and this new member must become the coordinator
+      coord = localAddress;
+    }
+    boolean coordIsNoob = true;
+    for (FindCoordinatorResponse resp: result) {
+      InternalDistributedMember mbr = resp.getCoordinator();
+      if (!state.alreadyTried.contains(mbr)) {
+        boolean mbrIsNoob = (mbr.getVmViewId() < 0);
+        if (mbrIsNoob) {
+          // member has not yet joined
+          if (coordIsNoob && (coord == null || coord.compareTo(mbr) > 0)) {
+            coord = mbr;
+          }
+        } else {
+          // member has already joined
+          if (coordIsNoob || mbr.getVmViewId() > coord.getVmViewId()) {
+            coord = mbr;
+            coordIsNoob = false;
+          }
+        }
+      }
+    }
+    
+    state.possibleCoordinator = coord;
+    return coord != null;
+  }
   
   /**
    * receives a JoinResponse holding a membership view or rejection message
@@ -732,6 +897,24 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
       joinResponse.notify();
     }
   }
+  
+  private void processFindCoordinatorRequest(FindCoordinatorRequest req) {
+    FindCoordinatorResponse resp;
+    if (this.isJoined) {
+      NetView v = currentView;
+      resp = new FindCoordinatorResponse(v.getCoordinator(), localAddress);
+    } else {
+      resp = new FindCoordinatorResponse(localAddress, localAddress);
+    }
+    resp.setRecipient(req.getMemberID());
+    services.getMessenger().send(resp);
+  }
+  
+  private void processFindCoordinatorResponse(FindCoordinatorResponse resp) {
+    synchronized(searchState.responses) {
+      searchState.responses.add(resp);
+    }
+  }
 
   @Override
   public NetView getView() {
@@ -873,15 +1056,6 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
   }
   
   
-  /** invoke this under the viewInstallationLock */
-  private void startCoordinatorServices() {
-    if (viewCreator == null || viewCreator.isShutdown()) {
-      viewCreator = new ViewCreator("GemFire Membership View Creator", Services.getThreadGroup());
-      viewCreator.setDaemon(true);
-      viewCreator.start();
-    }
-  }
-  
   private void stopCoordinatorServices() {
     if (viewCreator != null && !viewCreator.isShutdown()) {
       viewCreator.shutdown();
@@ -993,20 +1167,12 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
       RemoveMemberMessage msg = new RemoveMemberMessage(v.getPreferredCoordinators(filter, getMemberID(), 5), 
           m,
           reason);
-      if (this.isCoordinator) {
-        msg.setSender(this.localAddress);
-        processRemoveRequest(msg);
-      } else {
-        final NetView check;
-        synchronized(removedMembers) {
-          removedMembers.add(m);
-          check = new NetView(v, v.getViewId());
-          check.addCrashedMembers(removedMembers);
-          check.removeAll(removedMembers);
-        }
-        if (check.getCoordinator().equals(this.localAddress)) {
-          becomeCoordinator(v.getCoordinator());
-        }
+      msg.setSender(this.localAddress);
+      processRemoveRequest(msg);
+      if (!this.isCoordinator) {
+        msg.resetRecipients();
+        msg.setRecipients(v.getPreferredCoordinators(Collections.<InternalDistributedMember>emptySet(),
+            localAddress, 10));
         services.getMessenger().send(msg);
       }
     }
@@ -1048,6 +1214,8 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
     services.getMessenger().addHandler(RemoveMemberMessage.class, this);
     services.getMessenger().addHandler(JoinRequestMessage.class, this);
     services.getMessenger().addHandler(JoinResponseMessage.class, this);
+    services.getMessenger().addHandler(FindCoordinatorRequest.class, this);
+    services.getMessenger().addHandler(FindCoordinatorResponse.class, this);
 
     int ackCollectionTimeout = dc.getMemberTimeout() * 2 * 12437 / 10000;
     if (ackCollectionTimeout < 1500) {
@@ -1090,35 +1258,84 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
     case REMOVE_MEMBER_REQUEST:
       processRemoveRequest((RemoveMemberMessage)m);
       break;
+    case FIND_COORDINATOR_REQ:
+      processFindCoordinatorRequest((FindCoordinatorRequest)m);
+      break;
+    case FIND_COORDINATOR_RESP:
+      processFindCoordinatorResponse((FindCoordinatorResponse)m);
+      break;
     default:
       throw new IllegalArgumentException("unknown message type: " + m);
     }
   }
   
 
+  /**
+   * returns the member IDs of the pending requests having the given
+   * DataSerializableFixedID
+   */
+  Set<InternalDistributedMember> getPendingRequestIDs(int theDSFID) {
+    Set<InternalDistributedMember> result = new HashSet<>();
+    synchronized(viewRequests) {
+      for (DistributionMessage msg: viewRequests) {
+        if (msg.getDSFID() == theDSFID) {
+          result.add(((HasMemberID)msg).getMemberID());
+        }
+      }
+    }
+    return result;
+  }
   
   
   class ViewReplyProcessor {
     volatile int viewId = -1;
-    volatile Set<InternalDistributedMember> recipients;
-    volatile NetView conflictingView;
-    volatile InternalDistributedMember conflictingViewSender;
-    volatile boolean waiting;
+    final Set<InternalDistributedMember> notRepliedYet = new HashSet<>();
+    NetView conflictingView;
+    InternalDistributedMember conflictingViewSender;
+    boolean waiting;
     final boolean isPrepareViewProcessor;
+    final Set<InternalDistributedMember> pendingRemovals = new HashSet<>();
     
     ViewReplyProcessor(boolean forPreparation) {
       this.isPrepareViewProcessor = forPreparation;
     }
     
-    void initialize(int viewId, Set<InternalDistributedMember> recips) {
-      this.waiting = true;
+    synchronized void initialize(int viewId, Set<InternalDistributedMember> recips) {
+      waiting = true;
       this.viewId = viewId;
-      this.recipients = recips;
-      this.conflictingView = null;
+      notRepliedYet.clear();
+      notRepliedYet.addAll(recips);
+      conflictingView = null;
+      pendingRemovals.clear();
+    }
+    
+    synchronized void processPendingRequests(Set<InternalDistributedMember> pendingLeaves,
+        Set<InternalDistributedMember> pendingRemovals) {
+      // there's no point in waiting for members who have already
+      // requested to leave or who have been declared crashed.
+      // We don't want to mix the two because pending removals
+      // aren't reflected as having crashed in the current view
+      // and need to cause a new view to be generated
+      notRepliedYet.removeAll(pendingLeaves);
+      synchronized(this.pendingRemovals) {
+        this.pendingRemovals.addAll(pendingRemovals);
+      }
+    }
+    
+    synchronized void processLeaveRequest(InternalDistributedMember mbr) {
+      if (waiting) {
+        stopWaitingFor(mbr);
+      }
     }
     
-    void processViewResponse(int viewId, InternalDistributedMember sender, NetView conflictingView) {
-      if (!this.waiting) {
+    synchronized void processRemoveRequest(InternalDistributedMember mbr) {
+      if (waiting) {
+        pendingRemovals.add(mbr);
+      }
+    }
+    
+    synchronized void processViewResponse(int viewId, InternalDistributedMember sender, NetView conflictingView) {
+      if (!waiting) {
         return;
       }
       
@@ -1128,39 +1345,49 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
           this.conflictingView = conflictingView;
         }
 
-        Set<InternalDistributedMember> waitingFor = this.recipients;
-        synchronized(waitingFor) {
-          waitingFor.remove(sender);
-          if (waitingFor.isEmpty()) {
-            logger.debug("All view responses received - notifying waiting thread");
-            waitingFor.notify();
-          }
-        }
+        stopWaitingFor(sender);
+      }
+    }
 
+    /** call with synchronized(this) */
+    private void stopWaitingFor(InternalDistributedMember mbr) {
+      notRepliedYet.remove(mbr);
+      if (notRepliedYet.isEmpty() ||
+          (pendingRemovals != null && pendingRemovals.containsAll(notRepliedYet))) {
+        logger.debug("All anticipated view responses received - notifying waiting thread");
+        waiting = false;
+        notify();
       }
     }
     
     Set<InternalDistributedMember> waitForResponses() {
-      Set<InternalDistributedMember> result = this.recipients;
+      Set<InternalDistributedMember> result = this.notRepliedYet;
       long endOfWait = System.currentTimeMillis() + viewAckTimeout;
       try {
         while (System.currentTimeMillis() < endOfWait
             &&  (services.getCancelCriterion().cancelInProgress() == null)) {
           try {
-            synchronized(result) {
-              if (result.isEmpty() || this.conflictingView != null) {
+            synchronized(this) {
+              if (!waiting || result.isEmpty() || this.conflictingView != null) {
                 break;
               }
-              result.wait(1000);
+              wait(1000);
             }
           } catch (InterruptedException e) {
-            logger.debug("Interrupted while waiting for view resonses");
+            logger.debug("Interrupted while waiting for view responses");
             Thread.currentThread().interrupt();
             return result;
           }
         }
       } finally {
-        this.waiting = false;
+        if (!this.waiting) {
+          // if we've set waiting to false due to incoming messages then
+          // we've discounted receiving any other responses from the
+          // remaining members due to leave/crash notification
+          result = Collections.emptySet();
+        } else {
+          this.waiting = false;
+        }
       }
       return result;
     }
@@ -1174,7 +1401,7 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
     }
     
     Set<InternalDistributedMember> getUnresponsiveMembers() {
-      return this.recipients;
+      return this.notRepliedYet;
     }
   }
   
@@ -1186,6 +1413,10 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
     boolean shutdown = false;
     volatile boolean waiting = false;
     
+    NetView initialView;
+    Set<InternalDistributedMember> initialLeaving;
+    Set<InternalDistributedMember> initialRemovals;
+    
     ViewCreator(String name, ThreadGroup tg) {
       super(tg, name);
     }
@@ -1205,11 +1436,39 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
     boolean isWaiting() {
       return waiting;
     }
+    
+    /**
+     * All views should be sent by the ViewCreator thread, so
+     * if this member becomes coordinator it may have an initial
+     * view to transmit that announces the removal of the former coordinator to
+     * @param newView
+     * @param leaving - members leaving in this view
+     * @param removals - members crashed in this view
+     */
+    void setInitialView(NetView newView, Set<InternalDistributedMember> leaving, Set<InternalDistributedMember> removals) {
+      this.initialView = newView;
+      this.initialLeaving = leaving;
+      this.initialRemovals = removals;
+    }
+    
+    private void sendInitialView() {
+      if (initialView != null) {
+        try {
+          prepareAndSendView(initialView, Collections.<InternalDistributedMember>emptyList(),
+            initialLeaving, initialRemovals);
+        } finally {
+          this.initialView = null;
+          this.initialLeaving = null;
+          this.initialRemovals = null;
+        }
+      }
+    }
 
     @Override
     public void run() {
       List<DistributionMessage> requests = null;
       logger.info("View Creator thread is starting");
+      sendInitialView();
       long okayToCreateView = System.currentTimeMillis() + MEMBER_REQUEST_COLLECTION_INTERVAL;
       try {
         for (;;) {
@@ -1280,6 +1539,7 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
       } else {
         oldMembers = Collections.emptyList();
       }
+      Set<InternalDistributedMember> oldIDs = new HashSet<>();
       
       for (DistributionMessage msg: requests) {
         logger.debug("processing request {}", msg);
@@ -1288,7 +1548,14 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
         
         if (msg instanceof JoinRequestMessage) {
           mbr = ((JoinRequestMessage)msg).getMemberID();
-
+          // see if an old member ID is being reused.  If
+          // so we'll remove it from the new view
+          for (InternalDistributedMember m: oldMembers) {
+            if (mbr.compareTo(m, false) == 0) {
+              oldIDs.add(m);
+              break;
+            }
+          }
           if (!joinReqs.contains(mbr)) {
             joinReqs.add(mbr);
           }
@@ -1311,14 +1578,17 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
         }
       }
       
-      
+      for (InternalDistributedMember mbr: oldIDs) {
+        if (!leaveReqs.contains(mbr) && !removalReqs.contains(mbr)) {
+          removalReqs.add(mbr);
+          removalReasons.add("Removal of old ID that has been reused");
+        }
+      }
       
       if (removalReqs.isEmpty() && leaveReqs.isEmpty() && joinReqs.isEmpty()) {
         return;
       }
       
-      
-      
       NetView newView;
       synchronized(viewInstallationLock) {
         int viewNumber = 0;
@@ -1336,6 +1606,13 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
             removalReqs);
       }
       
+      // if there are no membership changes then abort creation of
+      // the new view
+      if (newView.getMembers().equals(currentView.getMembers())) {
+        logger.info("membership hasn't changed - aborting new view {}", newView);
+        return;
+      }
+      
       for (InternalDistributedMember mbr: joinReqs) {
         mbr.setVmViewId(newView.getViewId());
         mbr.getNetMember().setSplitBrainEnabled(services.getConfig().isNetworkPartitionDetectionEnabled());
@@ -1344,67 +1621,78 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
       // getting messages from members that have been kicked out
       sendRemoveMessages(removalReqs, removalReasons, newView);
       
-      // if there are no membership changes then abort creation of
-      // the new view
-      if (newView.getMembers().equals(currentView.getMembers())) {
-        logger.info("membership hasn't changed - aborting new view {}", newView);
-        return;
-      }
-      
       // we want to always check for quorum loss but don't act on it
       // unless network-partition-detection is enabled
       if ( !(isNetworkPartition(newView) && quorumRequired) ) {
         sendJoinResponses(joinReqs, newView);
       }
 
-      if (quorumRequired) {
-        boolean prepared = false;
-        do {
-          if (this.shutdown || Thread.currentThread().isInterrupted()) {
-            return;
-          }
-          prepared = prepareView(newView, joinReqs);
-          if (!prepared && quorumRequired) {
-            Set<InternalDistributedMember> unresponsive = prepareProcessor.getUnresponsiveMembers();
-            try {
-              removeHealthyMembers(unresponsive);
-            } catch (InterruptedException e) {
-              // abort the view if interrupted
-              shutdown = true;
-              return;
-            }
-  
-            if (!unresponsive.isEmpty()) {
-              List<InternalDistributedMember> failures = new ArrayList<>(currentView.getCrashedMembers().size() + unresponsive.size());
-              failures.addAll(unresponsive);
-
-              NetView conflictingView = prepareProcessor.getConflictingView();
-              if (conflictingView != null
-                  && !conflictingView.getCreator().equals(localAddress)
-                  && conflictingView.getViewId() > newView.getViewId()
-                  && (lastConflictingView == null || conflictingView.getViewId() > lastConflictingView.getViewId())) {
-                lastConflictingView = conflictingView;
-                failures.addAll(conflictingView.getCrashedMembers());
-              }
+      prepareAndSendView(newView, joinReqs, leaveReqs, removalReqs);
+      return;
+    }
+    
+    
+    /**
+     * This handles the 2-phase installation of the view
+     */
+    void prepareAndSendView(NetView newView,
+        List<InternalDistributedMember> joinReqs,
+        Set<InternalDistributedMember> leaveReqs,
+        Set<InternalDistributedMember> removalReqs) {
+      boolean prepared = false;
+      do {
+        if (this.shutdown || Thread.currentThread().isInterrupted()) {
+          return;
+        }
+        prepared = prepareView(newView, joinReqs);
+        if (prepared) {
+          break;
+        }
 
-              failures.removeAll(removalReqs);
-              if (failures.size() > 0) {
-                // abort the current view and try again
-                removalReqs.addAll(failures);
-                List<InternalDistributedMember> newMembers = new ArrayList<>(newView.getMembers());
-                newMembers.removeAll(removalReqs);
-                newView = new NetView(localAddress, newView.getViewId()+1, newMembers, leaveReqs,
-                    removalReqs);
-              }
-            }
-          }
-        } while (!prepared);
-      } // quorumRequired
+        Set<InternalDistributedMember> unresponsive = prepareProcessor.getUnresponsiveMembers();
+        unresponsive.removeAll(removalReqs);
+        unresponsive.removeAll(leaveReqs);
+        try {
+          removeHealthyMembers(unresponsive);
+        } catch (InterruptedException e) {
+          // abort the view if interrupted
+          shutdown = true;
+          return;
+        }
+
+        List<InternalDistributedMember> failures = new ArrayList<>(currentView.getCrashedMembers().size() + unresponsive.size());
+
+        NetView conflictingView = prepareProcessor.getConflictingView();
+        if (conflictingView != null
+            && !conflictingView.getCreator().equals(localAddress)
+            && conflictingView.getViewId() > newView.getViewId()
+            && (lastConflictingView == null || conflictingView.getViewId() > lastConflictingView.getViewId())) {
+          lastConflictingView = conflictingView;
+          logger.info("adding these crashed members from a conflicting view to the crash-set for the next view: {}\nconflicting view: {}", unresponsive, conflictingView);
+          failures.addAll(conflictingView.getCrashedMembers());
+        }
+
+        if (!unresponsive.isEmpty()) {
+          logger.info("adding these unresponsive members to the crash-set for the next view: {}", unresponsive);
+          failures.addAll(unresponsive);
+        }
+
+        failures.removeAll(removalReqs);
+        failures.removeAll(leaveReqs);
+        prepared = failures.isEmpty();
+        if (!prepared) {
+          // abort the current view and try again
+          removalReqs.addAll(failures);
+          List<InternalDistributedMember> newMembers = new ArrayList<>(newView.getMembers());
+          newMembers.removeAll(removalReqs);
+          newView = new NetView(localAddress, newView.getViewId()+1, newMembers, leaveReqs,
+              removalReqs);
+        }
+      } while (!prepared);
       
       lastConflictingView = null;
       
       sendView(newView, joinReqs);
-      return;
     }
     
     /**
@@ -1415,7 +1703,35 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
     private void removeHealthyMembers(Collection<InternalDistributedMember> mbrs) throws InterruptedException {
       List<Callable<InternalDistributedMember>> checkers = new ArrayList<Callable<InternalDistributedMember>>(mbrs.size()); 
       
+      Set<InternalDistributedMember> newRemovals = new HashSet<>();
+      Set<InternalDistributedMember> newLeaves = new HashSet<>();
+      
+      synchronized(viewRequests) {
+        for (DistributionMessage msg: viewRequests) {
+          switch (msg.getDSFID()) {
+          case LEAVE_REQUEST_MESSAGE:
+            newLeaves.add(((LeaveRequestMessage)msg).getMemberID());
+            break;
+          case REMOVE_MEMBER_REQUEST:
+            newRemovals.add(((RemoveMemberMessage)msg).getMemberID());
+            break;
+          default:
+            break;
+          }
+        }
+      }
+      
       for (InternalDistributedMember mbr: mbrs) {
+        if (newRemovals.contains(mbr)) {
+          // no need to do a health check on a member who is already leaving
+          logger.info("member {} is already scheduled for removal", mbr);
+          continue;
+        }
+        if (newLeaves.contains(mbr)) {
+          // no need to do a health check on a member that is declared crashed
+          logger.info("member {} has already sent a leave-request", mbr);
+          continue;
+        }
         final InternalDistributedMember fmbr = mbr;
         checkers.add(new Callable<InternalDistributedMember>() {
           @Override
@@ -1432,6 +1748,12 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
         });
       }
       
+      mbrs.removeAll(newLeaves);
+      
+      if (mbrs.isEmpty()) {
+        return;
+      }
+      
       ExecutorService svc = Executors.newFixedThreadPool(mbrs.size(), new ThreadFactory() {
         AtomicInteger i = new AtomicInteger();
         @Override
@@ -1440,7 +1762,7 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
               "Member verification thread " + i.incrementAndGet());
         }
       });
-
+      
       try {
         List<Future<InternalDistributedMember>> futures;
         futures = svc.invokeAll(checkers);
@@ -1449,7 +1771,6 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
           try {
             InternalDistributedMember mbr = future.get(viewAckTimeout, TimeUnit.MILLISECONDS);
             if (mbr != null) {
-              logger.debug("disregarding lack of acknowledgement from {}", mbr);
               mbrs.remove(mbr);
             }
           } catch (java.util.concurrent.TimeoutException e) {
@@ -1464,11 +1785,4 @@ public class GMSJoinLeave implements JoinLeave, MessageHandler {
     }
   }
   
-  private static class SearchState {
-    Set<InternalDistributedMember> alreadyTried = new HashSet<>();
-    InternalDistributedMember possibleCoordinator;
-    int viewId = -1;
-    private boolean hasContactedALocator;
-  }
-
 }

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/eab327f6/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/CheckRequestMessage.java
----------------------------------------------------------------------
diff --git a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/CheckRequestMessage.java b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/CheckRequestMessage.java
new file mode 100755
index 0000000..75f6b6e
--- /dev/null
+++ b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/CheckRequestMessage.java
@@ -0,0 +1,64 @@
+package com.gemstone.gemfire.distributed.internal.membership.gms.messages;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+
+import com.gemstone.gemfire.DataSerializer;
+import com.gemstone.gemfire.distributed.internal.DistributionManager;
+import com.gemstone.gemfire.distributed.internal.HighPriorityDistributionMessage;
+import com.gemstone.gemfire.distributed.internal.membership.InternalDistributedMember;
+import com.gemstone.gemfire.internal.Version;
+
+public class CheckRequestMessage extends HighPriorityDistributionMessage{
+
+  int requestId;
+  InternalDistributedMember target;
+  
+  public CheckRequestMessage(InternalDistributedMember neighbour, int id) {
+    requestId = id;
+    this.target = neighbour;
+  }
+  
+  public CheckRequestMessage(){}
+  
+  public InternalDistributedMember getTarget() {
+    return target;
+  }
+  
+  @Override
+  public int getDSFID() {
+    return CHECK_REQUEST;
+  }
+
+  @Override
+  protected void process(DistributionManager dm) {
+    throw new IllegalStateException("this message is not intended to execute in a thread pool");
+  }   
+
+  @Override
+  public String toString() {
+    return "CheckRequestMessage [requestId=" + requestId + "]";
+  }
+
+  public int getRequestId() {
+    return requestId;
+  }
+
+  @Override
+  public Version[] getSerializationVersions() {
+    return null;
+  }  
+  
+  @Override
+  public void toData(DataOutput out) throws IOException {
+    out.writeInt(requestId);
+    DataSerializer.writeObject(target, out);
+  }
+  
+  @Override
+  public void fromData(DataInput in) throws IOException, ClassNotFoundException {
+    requestId = in.readInt();
+    target = DataSerializer.readObject(in);
+  }
+}

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/eab327f6/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/CheckResponseMessage.java
----------------------------------------------------------------------
diff --git a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/CheckResponseMessage.java b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/CheckResponseMessage.java
new file mode 100755
index 0000000..b6f3735
--- /dev/null
+++ b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/CheckResponseMessage.java
@@ -0,0 +1,54 @@
+package com.gemstone.gemfire.distributed.internal.membership.gms.messages;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+
+import com.gemstone.gemfire.distributed.internal.DistributionManager;
+import com.gemstone.gemfire.distributed.internal.HighPriorityDistributionMessage;
+import com.gemstone.gemfire.internal.Version;
+
+public class CheckResponseMessage extends HighPriorityDistributionMessage {
+  int requestId;
+  
+  public CheckResponseMessage(int id) {
+    requestId = id;
+  }
+
+  public CheckResponseMessage(){}
+  
+  public int getRequestId() {
+    return requestId;
+  }
+
+
+  @Override
+  public int getDSFID() {
+    return CHECK_RESPONSE;
+  }
+
+  @Override
+  protected void process(DistributionManager dm) {
+    throw new IllegalStateException("this message is not intended to execute in a thread pool");
+  }
+ 
+  @Override
+  public String toString() {
+    return "CheckResponseMessage [requestId=" + requestId + "]";
+  }
+
+  @Override
+  public Version[] getSerializationVersions() {
+    return null;
+  }  
+
+  @Override
+  public void toData(DataOutput out) throws IOException {
+    out.writeInt(requestId);
+  }
+  
+  @Override
+  public void fromData(DataInput in) throws IOException, ClassNotFoundException {
+    requestId = in.readInt();
+  }
+}

http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/eab327f6/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/InstallViewMessage.java
----------------------------------------------------------------------
diff --git a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/InstallViewMessage.java b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/InstallViewMessage.java
index d8616d0..fa9989f 100755
--- a/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/InstallViewMessage.java
+++ b/gemfire-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/InstallViewMessage.java
@@ -72,8 +72,7 @@ public class InstallViewMessage extends HighPriorityDistributionMessage {
 
   @Override
   public String toString() {
-    String s = getSender() == null? getRecipientsDescription() : ""+getSender();
-    return "InstallViewMessage("+s+"; preparing="+this.preparing+"; "+this.view
+    return "InstallViewMessage(preparing="+this.preparing+"; "+this.view
             +"; cred="+(credentials==null?"null": "not null")
              +")";
   }



Mime
View raw message