OK, so I don't need to use tokenupdater, what are the steps to rebalance data around the circle?
In my test example (see below), I have A, D, B and C (clockwise) where
A holds 1/3 of the data
D - 1/6
B - 1/6
C - 1/3
I'm willing to change tokens manually, it's all right.
How do I tell all nodes to move data around in version 0.4? Do I change token on node A and restart it with -b? Then same for the rest? restarting only one node at a time?
tokenupdater does not move data around; it's just an alternative to
setting <initialtoken> on each node. so you really want to get your
tokens right for your initial set of nodes before adding data.
we're finishing up full load balancing for 0.5 but even then it's best
to start with a reasonable distribution instead of starting with
random and forcing the balancer to move things around a bunch.
On Thu, Oct 1, 2009 at 12:14 PM, Igor Katkov <email@example.com> wrote:
> What is the correct procedure for data re-partitioning?
> Suppose I have 3 nodes - "A", "B", "C"
> tokens on the ring:
> A: 0
> B: 2.8356863910078205288614550619314e+37
> C: 5.6713727820156410577229101238628e+37
> Then I add node "D", token: 1.4178431955039102644307275309655e+37 (B/2)
> Start node "D" with -b
> Run nodeprobe -host hostB ... cleanup on live "B"
> Now data is not evenly balanced because tokens are not evenly spaced. I see
> that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater)
> What happens with keys and data if I run it on "A", "B", "C" and "D" with
> new, better spaced tokens? Should I? is there a better procedure?
> On Thu, Oct 1, 2009 at 12:48 PM, Jonathan Ellis <firstname.lastname@example.org> wrote:
>> On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <email@example.com> wrote:
>> > Hi,
>> > Question#1:
>> > How to manually select tokens to force equal spacing of tokens around
>> > the
>> > hash space?
>> (Answered by Jun.)
>> > Question#2:
>> > Let's assume that #1 was resolved somehow and key distribution is more
>> > or
>> > less even.
>> > A new node "C" joins the cluster. It's token falls somewhere between two
>> > other tokens on the ring (from nodes "A" and "B" clockwise-ordered).
>> > From
>> > now on "C" is responsible for a portion of data that used to exclusively
>> > belong to "B".
>> > a. Cassandra v.0.4 will not automatically transfer this data to "C" will
>> > it?
>> It will, if you start C with the -b ("bootstrap") flag.
>> > b. Do all reads to these keys fail?
>> > c. What happens with the data reference by these keys on "B"? It will
>> > never
>> > be accessed there, therefor it becomes garbage. Since there are to GC
>> > will
>> > it stick forever?
>> nodeprobe cleanup after the bootstrap completes will instruct B to
>> throw out data that has been copied to C.
>> > d. What happens to replicas of these keys?
>> These are also handled by -b.