incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Coe <robin....@bluecoat.com>
Subject Re: loadbalance and different strategies
Date Tue, 09 Feb 2010 19:48:27 GMT
Thanks for the link, Stu.

So, from what I gather, initial tokens are required for seed nodes,
which then govern how keys are distributed across the cluster, implying
that the load balancer does not perform any key redistribution function.
Does the possibility for automatic key redistribution exist in the
architecture or does the md5 hashing of keys provide a decent
probability that keys will be evenly distributed?

Given the current implementation, let's say you determine that your keys
aren't evenly distributed, thus you want to change your tokens, instead
of adding a new node.  When you issue the nodeprobe flush command, does
that disable all incoming write requests for that node?  If so, are read
requests also turned away or will the node continue to service reads
until the process is killed?

Are there any side effects from taking down an existing cluster,
changing the tokens and restarting, other than the redistribution of
data that will occur?

Thanks,
Robin.

On Tue, 2010-02-09 at 13:16 -0600, Stu Hood wrote:

> The 'Ring management' and 'Range changes' sections of the wiki have gotten a lot better
recently, and answer these questions. Specifically, look on that page for 'autobootstrap'.
> 
> http://wiki.apache.org/cassandra/Operations#Ring_management
> 
> Thanks,
> Stu
> 
> 
> -----Original Message-----
> From: "Robin Coe" <robin.coe@bluecoat.com>
> Sent: Tuesday, February 9, 2010 12:58pm
> To: cassandra-dev@incubator.apache.org
> Subject: Re: loadbalance and different strategies
> 
> Is it true that it is no longer necessary to specify an initial token?
> If so, how would you add a new node into a ring such that it guarantees
> replicas are spread evenly across data centres?  Is this achieved simply
> by starting a new node in the opposite DC and watching the log for the
> message that it's receiving requests, before bootstrapping the next
> node?  Or is it possible to bootstrap multiple nodes simultaneously
> around the cluster and let Cassandra figure out the replica distribution
> pattern?
> 
> I'm also curious about the distribution of keys across nodes.  The talk
> I've seen discusses how replicas are distributed around the cluster but
> since its the number of keys on a node that really governs its load,
> assuming all keys are retrieved with equal frequency, does the load
> balancer also function to redistribute keys amongst the nodes? 
> 
> Robin.
> 
> On Tue, 2010-02-09 at 10:21 -0600, Jonathan Ellis wrote:
> 
> > On Tue, Feb 9, 2010 at 3:13 AM, Jaakko <rosvopaallikko@gmail.com> wrote:
> > > What they probably should do, is to just
> > > consider nodes in the DC they are booting to, and try to balance load
> > > evenly in that DC.
> > 
> > I'm not sure what problem that would solve.  It seems to me there are two goals:
> > 
> >  1. don't transfer data across data centers
> >  2. improve ring balance when you add nodes
> > 
> > (1) should always be the case no matter where on the ring the node is
> > since there will be at least one replica of each range in each DC.
> > 
> > (2) is where we get into trouble here no matter which DC we add to.
> >  (a) if we add to G's DC, X will get all the replicas G has, remaining
> > unbalanced
> >  (b) if we add to the other DC, G will still be hit from all the
> > replicas from the other DC
> > 
> > So ISTM that the only real solution is to do what we say in the
> > Operations page, and make sure that nodes on the ring alternate DCs.
> > I don't think only considering nodes in the same DC helps with that.
> > 
> > -Jonathan
> 
> 
> 
> 



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message