Cluster join protocol requirements
Summary of requirements:
- support persistent cluster restart:
- persistent member crashes is re started - re-joins running cluster
- automatic restart after orderly shutdown
- manual recovery after total cluster failure
- support waiting for N initial members before going active.
- enforce consistency of broker options that need to be identical across cluster
- solve simultaneous join - initial cluster has >1 member
Terminology
Two phases:
- initialization: how initial members form an active cluster.
- joining: how new members join an active cluster
An active member has up broker state synchronized with the other active cluster members and is ready to serve clients.
An initializing member does not have broker state and cannot serve clients until it becomes active.
A signature is a collection of data that matches for stores that were part of the same cluster and is unlikely to match stores that were not. It may include:
- timestamp (co-ordinated across the cluster)
- cluster membership
- cluster name and other option values
- frame sequence number
Initial signature: set when the store was created and never changed. Identifies stores belonging to the same cluster.
Final signature: updated while cluster runs and set during orderly shutdown. Identifies matching clean stores. Used by manual recovery
When a broker is shut down as part of an orderly shut-down of the entire cluster, it marks its store as "clean", i.e. safe to restart with this store. Otherwise the store is considered "dirty".
Wait for N
New option: cluster-wait-for N. Wait for at least N initial members before going active.
- guarantees that clients will not be served till N members are active.
- faster startup: multiple members can start from store rather than sending updates.
Initialization
- Wait for N initial members
- Verify options are consistent or abort.
Define sets
- I=all the initial members
- C=members with a clean store
- D=members with a dirty store
- E=members with an empty store
- T=members with no store (transient)
If all members are transient or have empty stores (T+E=I) then all members go active. Members in E set the initial signature on their stores.
Else (some members have non-empty stores):
- Verify all non-empty stores (C+D) have the same initial signature or abort.
- Verify all clean stores C have same final signature or abort.
- If there are no clean stores abort, log message: need manual recovery.
- Else members without a clean store:
- get updates from members with clean store.
- set initial signature to same value as clean stores.
- when all updates done go active.
Joining
New member joins active cluster. Joining member has
- no/empty store: get an update from an active member.
- has store store:
- initial signature matches active members: get an update
- else abort - wrong store
Manual Recovery
If the entire cluster fails or is shut down dirty then manual recovery is required. Provide tools to examine broker data directories and determine if two signatures belong to the same cluster and if so which is the "latest" one.
Recovery procedure is to mark the latest store as clean and then restart the cluster.