What is the correct procedure for data re-partitioning?
Suppose I have 3 nodes - "A", "B", "C"
tokens on the ring:
Then I add node "D", token: 1.4178431955039102644307275309655e+37 (B/2)
Start node "D" with -b
Run nodeprobe -host hostB ... cleanup on live "B"
Now data is not evenly balanced because tokens are not evenly spaced. I see that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater)
What happens with keys and data if I run it on "A", "B", "C" and "D" with new, better spaced tokens? Should I? is there a better procedure?
On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <firstname.lastname@example.org> wrote:(Answered by Jun.)
> How to manually select tokens to force equal spacing of tokens around the
> hash space?
It will, if you start C with the -b ("bootstrap") flag.
> Let's assume that #1 was resolved somehow and key distribution is more or
> less even.
> A new node "C" joins the cluster. It's token falls somewhere between two
> other tokens on the ring (from nodes "A" and "B" clockwise-ordered). From
> now on "C" is responsible for a portion of data that used to exclusively
> belong to "B".
> a. Cassandra v.0.4 will not automatically transfer this data to "C" will it?
> b. Do all reads to these keys fail?
nodeprobe cleanup after the bootstrap completes will instruct B to
> c. What happens with the data reference by these keys on "B"? It will never
> be accessed there, therefor it becomes garbage. Since there are to GC will
> it stick forever?
throw out data that has been copied to C.
These are also handled by -b.
> d. What happens to replicas of these keys?