incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Katkov <ikat...@gmail.com>
Subject Re: distributing tokens equally along the key distribution space
Date Thu, 01 Oct 2009 17:49:19 GMT
I see, so to make cluster always balanced (data-wise) number of nodes should
be doubled each time.
I see some activity in JIAR regarding load-balancing for v.0.5
Does it target the same thing? transferring data from node to node and
appropriately modifying tokens?

On Thu, Oct 1, 2009 at 1:42 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

> You basically have two options.  You can wipe your data, change the
> tokens, and reload things, or you can add new nodes with -b to
> rebalance things that way.
>
> On Thu, Oct 1, 2009 at 12:34 PM, Igor Katkov <ikatkov@gmail.com> wrote:
> > OK, so I don't need to use tokenupdater, what are the steps to rebalance
> > data around the circle?
> >
> > In my test example (see below), I have A, D, B and C (clockwise) where
> > A holds 1/3 of the data
> > D - 1/6
> > B - 1/6
> > C - 1/3
> > I'm willing to change tokens manually, it's all right.
> > How do I tell all nodes to move data around in version 0.4? Do I change
> > token on node A and restart it with -b? Then same for the rest?
> restarting
> > only one node at a time?
> >
> >
> >
> > On Thu, Oct 1, 2009 at 1:22 PM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >>
> >> tokenupdater does not move data around; it's just an alternative to
> >> setting <initialtoken> on each node.  so you really want to get your
> >> tokens right for your initial set of nodes before adding data.
> >>
> >> we're finishing up full load balancing for 0.5 but even then it's best
> >> to start with a reasonable distribution instead of starting with
> >> random and forcing the balancer to move things around a bunch.
> >>
> >> On Thu, Oct 1, 2009 at 12:14 PM, Igor Katkov <ikatkov@gmail.com> wrote:
> >> > What is the correct procedure for data re-partitioning?
> >> > Suppose I have 3 nodes - "A", "B", "C"
> >> > tokens on the ring:
> >> > A: 0
> >> > B: 2.8356863910078205288614550619314e+37
> >> > C: 5.6713727820156410577229101238628e+37
> >> >
> >> > Then I add node "D", token: 1.4178431955039102644307275309655e+37
> (B/2)
> >> > Start node "D" with -b
> >> > Wait
> >> > Run nodeprobe -host hostB ... cleanup on live "B"
> >> > Wait
> >> > Done
> >> >
> >> > Now data is not evenly balanced because tokens are not evenly spaced.
> I
> >> > see
> >> > that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater)
> >> > What happens with keys and data if I run it on "A", "B", "C" and "D"
> >> > with
> >> > new, better spaced tokens? Should I? is there a better procedure?
> >> >
> >> >
> >> >
> >> >
> >> > On Thu, Oct 1, 2009 at 12:48 PM, Jonathan Ellis <jbellis@gmail.com>
> >> > wrote:
> >> >>
> >> >> On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <ikatkov@gmail.com>
> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > Question#1:
> >> >> > How to manually select tokens to force equal spacing of tokens
> around
> >> >> > the
> >> >> > hash space?
> >> >>
> >> >> (Answered by Jun.)
> >> >>
> >> >> > Question#2:
> >> >> > Let's assume that #1 was resolved somehow and key distribution
is
> >> >> > more
> >> >> > or
> >> >> > less even.
> >> >> > A new node "C" joins the cluster. It's token falls somewhere
> between
> >> >> > two
> >> >> > other tokens on the ring (from nodes "A" and "B"
> clockwise-ordered).
> >> >> > From
> >> >> > now on "C" is responsible for a portion of data that used to
> >> >> > exclusively
> >> >> > belong to "B".
> >> >> > a. Cassandra v.0.4 will not automatically transfer this data to
"C"
> >> >> > will
> >> >> > it?
> >> >>
> >> >> It will, if you start C with the -b ("bootstrap") flag.
> >> >>
> >> >> > b. Do all reads to these keys fail?
> >> >>
> >> >> No.
> >> >>
> >> >> > c. What happens with the data reference by these keys on "B"?
It
> will
> >> >> > never
> >> >> > be accessed there, therefor it becomes garbage. Since there are
to
> GC
> >> >> > will
> >> >> > it stick forever?
> >> >>
> >> >> nodeprobe cleanup after the bootstrap completes will instruct B to
> >> >> throw out data that has been copied to C.
> >> >>
> >> >> > d. What happens to replicas of these keys?
> >> >>
> >> >> These are also handled by -b.
> >> >>
> >> >> -Jonathan
> >> >
> >> >
> >
> >
>

Mime
View raw message