cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-9667) strongly consistent membership and ownership
Date Sat, 27 Jun 2015 14:06:04 GMT
Jason Brown created CASSANDRA-9667:

             Summary: strongly consistent membership and ownership
                 Key: CASSANDRA-9667
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
            Reporter: Jason Brown
            Assignee: Jason Brown
             Fix For: 3.x

Currently, there is advice to users to "wait two minutes between adding new nodes" in order
for new node tokens, et al, to propagate. Further, as there's no coordination amongst joining
node wrt token selection, new nodes can end up selecting ranges that overlap with other joining
nodes. This causes a lot of duplicate streaming from the existing source nodes as they shovel
out the bootstrap data for those new nodes.

This ticket proposes creating a mechanism that allows strongly consistent membership and ownership
changes in cassandra such that changes are performed in a linearizable and safe manner. The
basic idea is to use LWT operations over a global system table, and leverage the linearizability
of LWT for ensuring the safety of cluster membership/ownership state changes. This will necessitate
changes to the existing workflows for node join, decommission, remove, replace, and range
move (there may be others I'm not thinking of), as well as changes to nodetool.

Note: we distinguish between membership and ownership in the following ways: for membership
we mean "a host in this cluster and it's state". For ownership, we mean "what tokens (or ranges)
does each node own"; these nodes must already be a member to be assigned tokens.

A rough draft sketch of how the 'add new node' workflow might look like is: new nodes would
no longer create tokens themselves, but instead contact a member of a Paxos cohort (via a
seed). The cohort member will generate the tokens and execute a LWT transaction, ensuring
a linearizable change to the membership/ownership state. The updated state will then be disseminated
via the existing gossip.

As for joining specifically, I think we could support two modes: auto-mode and manual-mode.
Auto-mode is for adding a single new node per LWT operation, and would require no operator
intervention (much like today). In manual-mode, however, multiple new nodes could (somehow)
signal their their intent to join to the cluster, but will wait until an operator executes
a nodetool command that will trigger the token generation and LWT operation for all pending
new nodes. This will allow us better range partitioning and will make the bootstrap streaming
more efficient as we won't have overlapping range requests.

This message was sent by Atlassian JIRA

View raw message