incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: distributing tokens equally along the key distribution space
Date Thu, 01 Oct 2009 18:04:14 GMT
yes

On Thu, Oct 1, 2009 at 12:49 PM, Igor Katkov <ikatkov@gmail.com> wrote:
> I see, so to make cluster always balanced (data-wise) number of nodes should
> be doubled each time.
> I see some activity in JIAR regarding load-balancing for v.0.5
> Does it target the same thing? transferring data from node to node and
> appropriately modifying tokens?
>
> On Thu, Oct 1, 2009 at 1:42 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> You basically have two options.  You can wipe your data, change the
>> tokens, and reload things, or you can add new nodes with -b to
>> rebalance things that way.
>>
>> On Thu, Oct 1, 2009 at 12:34 PM, Igor Katkov <ikatkov@gmail.com> wrote:
>> > OK, so I don't need to use tokenupdater, what are the steps to rebalance
>> > data around the circle?
>> >
>> > In my test example (see below), I have A, D, B and C (clockwise) where
>> > A holds 1/3 of the data
>> > D - 1/6
>> > B - 1/6
>> > C - 1/3
>> > I'm willing to change tokens manually, it's all right.
>> > How do I tell all nodes to move data around in version 0.4? Do I change
>> > token on node A and restart it with -b? Then same for the rest?
>> > restarting
>> > only one node at a time?
>> >
>> >
>> >
>> > On Thu, Oct 1, 2009 at 1:22 PM, Jonathan Ellis <jbellis@gmail.com>
>> > wrote:
>> >>
>> >> tokenupdater does not move data around; it's just an alternative to
>> >> setting <initialtoken> on each node.  so you really want to get your
>> >> tokens right for your initial set of nodes before adding data.
>> >>
>> >> we're finishing up full load balancing for 0.5 but even then it's best
>> >> to start with a reasonable distribution instead of starting with
>> >> random and forcing the balancer to move things around a bunch.
>> >>
>> >> On Thu, Oct 1, 2009 at 12:14 PM, Igor Katkov <ikatkov@gmail.com> wrote:
>> >> > What is the correct procedure for data re-partitioning?
>> >> > Suppose I have 3 nodes - "A", "B", "C"
>> >> > tokens on the ring:
>> >> > A: 0
>> >> > B: 2.8356863910078205288614550619314e+37
>> >> > C: 5.6713727820156410577229101238628e+37
>> >> >
>> >> > Then I add node "D", token: 1.4178431955039102644307275309655e+37
>> >> > (B/2)
>> >> > Start node "D" with -b
>> >> > Wait
>> >> > Run nodeprobe -host hostB ... cleanup on live "B"
>> >> > Wait
>> >> > Done
>> >> >
>> >> > Now data is not evenly balanced because tokens are not evenly spaced.
>> >> > I
>> >> > see
>> >> > that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater)
>> >> > What happens with keys and data if I run it on "A", "B", "C" and "D"
>> >> > with
>> >> > new, better spaced tokens? Should I? is there a better procedure?
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Oct 1, 2009 at 12:48 PM, Jonathan Ellis <jbellis@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <ikatkov@gmail.com>
>> >> >> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > Question#1:
>> >> >> > How to manually select tokens to force equal spacing of tokens
>> >> >> > around
>> >> >> > the
>> >> >> > hash space?
>> >> >>
>> >> >> (Answered by Jun.)
>> >> >>
>> >> >> > Question#2:
>> >> >> > Let's assume that #1 was resolved somehow and key distribution
is
>> >> >> > more
>> >> >> > or
>> >> >> > less even.
>> >> >> > A new node "C" joins the cluster. It's token falls somewhere
>> >> >> > between
>> >> >> > two
>> >> >> > other tokens on the ring (from nodes "A" and "B"
>> >> >> > clockwise-ordered).
>> >> >> > From
>> >> >> > now on "C" is responsible for a portion of data that used
to
>> >> >> > exclusively
>> >> >> > belong to "B".
>> >> >> > a. Cassandra v.0.4 will not automatically transfer this data
to
>> >> >> > "C"
>> >> >> > will
>> >> >> > it?
>> >> >>
>> >> >> It will, if you start C with the -b ("bootstrap") flag.
>> >> >>
>> >> >> > b. Do all reads to these keys fail?
>> >> >>
>> >> >> No.
>> >> >>
>> >> >> > c. What happens with the data reference by these keys on "B"?
It
>> >> >> > will
>> >> >> > never
>> >> >> > be accessed there, therefor it becomes garbage. Since there
are to
>> >> >> > GC
>> >> >> > will
>> >> >> > it stick forever?
>> >> >>
>> >> >> nodeprobe cleanup after the bootstrap completes will instruct B
to
>> >> >> throw out data that has been copied to C.
>> >> >>
>> >> >> > d. What happens to replicas of these keys?
>> >> >>
>> >> >> These are also handled by -b.
>> >> >>
>> >> >> -Jonathan
>> >> >
>> >> >
>> >
>> >
>
>

Mime
View raw message