So in theory, one could double a cluster by:
1) moving snapshots of each node to a new node.
2) for each snapshot moved, figure out the primary range of the new node by
taking the old node's primary range token and calculating the midpoint
value between that and the next primary range start token
3) the RFs should be preserved since the snapshot have a replicated set of
data for the old primary range, the next primary has a RF already, and so
does the n+1 primary range already
data distribution will be the same as the old primary range distirubtion.
Then nodetool clean and repair would get rid of old data ranges not needed
anymore.
In practice, is this possible? I have heard Priam can double clusters and
they do not use vnodes. I am assuming they do a similar approach but they
only have to calculate single tokens?
> As I understand it: Replicas of data are replicated to the next primary
> range owner.
> As tokens are randomly generated (at least in 2.1.x that I am on), can't
> we have this situation:
>
> Say we have RF3, but the tokens happen to line up where:
>
> NodeA handles 010
> NodeB handles 1120
> NodeA handlea 2130
> NodeB handles 3140
> NodeC handles 4050
> The key aspect of that is that the random assignment of primary range
> vnode tokens has resulted in NodeA and NodeB being the primaries for four
> adjacent primary ranges.
>
> IF RF is replicated by going to the next adjacent nodes in the primary
> range, and we are, say RF3, then B will have a replica of A, and then the
> THIRD REPLICA IS BACK ON A.
> Is the RF distribution durable to this by ignoring the reappearance of A
> and then cycling through until a unique node (NodeC) is encountered, and
> then that becomes the third replica?
>
