Is it true that it is no longer necessary to specify an initial token?
If so, how would you add a new node into a ring such that it guarantees
replicas are spread evenly across data centres? Is this achieved simply
by starting a new node in the opposite DC and watching the log for the
message that it's receiving requests, before bootstrapping the next
node? Or is it possible to bootstrap multiple nodes simultaneously
around the cluster and let Cassandra figure out the replica distribution
pattern?
I'm also curious about the distribution of keys across nodes. The talk
I've seen discusses how replicas are distributed around the cluster but
since its the number of keys on a node that really governs its load,
assuming all keys are retrieved with equal frequency, does the load
balancer also function to redistribute keys amongst the nodes?
Robin.
On Tue, 20100209 at 10:21 0600, Jonathan Ellis wrote:
> On Tue, Feb 9, 2010 at 3:13 AM, Jaakko <rosvopaallikko@gmail.com> wrote:
> > What they probably should do, is to just
> > consider nodes in the DC they are booting to, and try to balance load
> > evenly in that DC.
>
> I'm not sure what problem that would solve. It seems to me there are two goals:
>
> 1. don't transfer data across data centers
> 2. improve ring balance when you add nodes
>
> (1) should always be the case no matter where on the ring the node is
> since there will be at least one replica of each range in each DC.
>
> (2) is where we get into trouble here no matter which DC we add to.
> (a) if we add to G's DC, X will get all the replicas G has, remaining
> unbalanced
> (b) if we add to the other DC, G will still be hit from all the
> replicas from the other DC
>
> So ISTM that the only real solution is to do what we say in the
> Operations page, and make sure that nodes on the ring alternate DCs.
> I don't think only considering nodes in the same DC helps with that.
>
> Jonathan
