cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: distributing tokens equally along the key distribution space
Date Thu, 01 Oct 2009 17:42:10 GMT
You basically have two options.  You can wipe your data, change the
tokens, and reload things, or you can add new nodes with -b to
rebalance things that way.

On Thu, Oct 1, 2009 at 12:34 PM, Igor Katkov <ikatkov@gmail.com> wrote:
> OK, so I don't need to use tokenupdater, what are the steps to rebalance
> data around the circle?
>
> In my test example (see below), I have A, D, B and C (clockwise) where
> A holds 1/3 of the data
> D - 1/6
> B - 1/6
> C - 1/3
> I'm willing to change tokens manually, it's all right.
> How do I tell all nodes to move data around in version 0.4? Do I change
> token on node A and restart it with -b? Then same for the rest? restarting
> only one node at a time?
>
>
>
> On Thu, Oct 1, 2009 at 1:22 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> tokenupdater does not move data around; it's just an alternative to
>> setting <initialtoken> on each node.  so you really want to get your
>> tokens right for your initial set of nodes before adding data.
>>
>> we're finishing up full load balancing for 0.5 but even then it's best
>> to start with a reasonable distribution instead of starting with
>> random and forcing the balancer to move things around a bunch.
>>
>> On Thu, Oct 1, 2009 at 12:14 PM, Igor Katkov <ikatkov@gmail.com> wrote:
>> > What is the correct procedure for data re-partitioning?
>> > Suppose I have 3 nodes - "A", "B", "C"
>> > tokens on the ring:
>> > A: 0
>> > B: 2.8356863910078205288614550619314e+37
>> > C: 5.6713727820156410577229101238628e+37
>> >
>> > Then I add node "D", token: 1.4178431955039102644307275309655e+37 (B/2)
>> > Start node "D" with -b
>> > Wait
>> > Run nodeprobe -host hostB ... cleanup on live "B"
>> > Wait
>> > Done
>> >
>> > Now data is not evenly balanced because tokens are not evenly spaced. I
>> > see
>> > that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater)
>> > What happens with keys and data if I run it on "A", "B", "C" and "D"
>> > with
>> > new, better spaced tokens? Should I? is there a better procedure?
>> >
>> >
>> >
>> >
>> > On Thu, Oct 1, 2009 at 12:48 PM, Jonathan Ellis <jbellis@gmail.com>
>> > wrote:
>> >>
>> >> On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <ikatkov@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > Question#1:
>> >> > How to manually select tokens to force equal spacing of tokens around
>> >> > the
>> >> > hash space?
>> >>
>> >> (Answered by Jun.)
>> >>
>> >> > Question#2:
>> >> > Let's assume that #1 was resolved somehow and key distribution is
>> >> > more
>> >> > or
>> >> > less even.
>> >> > A new node "C" joins the cluster. It's token falls somewhere between
>> >> > two
>> >> > other tokens on the ring (from nodes "A" and "B" clockwise-ordered).
>> >> > From
>> >> > now on "C" is responsible for a portion of data that used to
>> >> > exclusively
>> >> > belong to "B".
>> >> > a. Cassandra v.0.4 will not automatically transfer this data to "C"
>> >> > will
>> >> > it?
>> >>
>> >> It will, if you start C with the -b ("bootstrap") flag.
>> >>
>> >> > b. Do all reads to these keys fail?
>> >>
>> >> No.
>> >>
>> >> > c. What happens with the data reference by these keys on "B"? It will
>> >> > never
>> >> > be accessed there, therefor it becomes garbage. Since there are to
GC
>> >> > will
>> >> > it stick forever?
>> >>
>> >> nodeprobe cleanup after the bootstrap completes will instruct B to
>> >> throw out data that has been copied to C.
>> >>
>> >> > d. What happens to replicas of these keys?
>> >>
>> >> These are also handled by -b.
>> >>
>> >> -Jonathan
>> >
>> >
>
>

Mime
View raw message