What is the correct procedure for data re-partitioning?
Suppose I have 3 nodes - "A", "B", "C"
tokens on the ring:
A: 0
B: 2.8356863910078205288614550619314e+37
C: 5.6713727820156410577229101238628e+37

Then I add node "D", token: 1.4178431955039102644307275309655e+37 (B/2)
Start node "D" with -b
Wait
Run nodeprobe -host hostB ... cleanup on live "B"
Wait
Done

Now data is not evenly balanced because tokens are not evenly spaced. I see that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater)
What happens with keys and data if I run it on "A", "B", "C" and "D" with new, better spaced tokens? Should I? is there a better procedure?




On Thu, Oct 1, 2009 at 12:48 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <ikatkov@gmail.com> wrote:
> Hi,
>
> Question#1:
> How to manually select tokens to force equal spacing of tokens around the
> hash space?

(Answered by Jun.)

> Question#2:
> Let's assume that #1 was resolved somehow and key distribution is more or
> less even.
> A new node "C" joins the cluster. It's token falls somewhere between two
> other tokens on the ring (from nodes "A" and "B" clockwise-ordered). From
> now on "C" is responsible for a portion of data that used to exclusively
> belong to "B".
> a. Cassandra v.0.4 will not automatically transfer this data to "C" will it?

It will, if you start C with the -b ("bootstrap") flag.

> b. Do all reads to these keys fail?

No.

> c. What happens with the data reference by these keys on "B"? It will never
> be accessed there, therefor it becomes garbage. Since there are to GC will
> it stick forever?

nodeprobe cleanup after the bootstrap completes will instruct B to
throw out data that has been copied to C.

> d. What happens to replicas of these keys?

These are also handled by -b.

-Jonathan