cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Coe, Robin" <robin....@bluecoat.com>
Subject RE: loadbalance and different strategies
Date Wed, 10 Feb 2010 00:40:48 GMT
I probably should have separated my questions; the question about 'nodeprobe flush' was based
on emails I remember seeing sometime ago, about clearing the commit log, so data could be
moved across nodes.  I couldn't find any information about what effect a node given that command
had on incoming read/write requests.

Am I correct in assuming that a node given the flush command will not accept new writes; does
issuing a flush command effectively removes the node from the cluster, as far as writes go?
 I expect reads could still be allowed to succeed even after the flush command is executed
but was hoping for a better understanding of the behaviour of a node that is in a flush state.

Thanks,
Robin.

-----Original Message-----
From: Stu Hood [mailto:stu.hood@rackspace.com]
Sent: Tue 09/02/2010 12:24
To: cassandra-dev@incubator.apache.org
Cc: cassandra-dev@incubator.apache.org; cassandra-users@bluecoat.com
Subject: Re: loadbalance and different strategies
 
In 0.5, nodes can be automatically rebalanced one at a time using the 'nodetool loadbalance'
command, mentioned on that page (although, admittedly, it is in the wrong section).

'nodetool flush' has nothing to do with key distribution: it is a local operation.

> Are there any side effects from taking down an existing cluster,
> changing the tokens and restarting
You would not want to do this, unless your replication factor was high enough that every node
had a replica of every other node's data. Use the instructions on the Operations page instead.

Thanks,
Stu

-----Original Message-----
From: "Robin Coe" <robin.coe@bluecoat.com>
Sent: Tuesday, February 9, 2010 1:48pm
To: cassandra-dev@incubator.apache.org
Cc: cassandra-users@bluecoat.com
Subject: Re: loadbalance and different strategies

Thanks for the link, Stu.

So, from what I gather, initial tokens are required for seed nodes,
which then govern how keys are distributed across the cluster, implying
that the load balancer does not perform any key redistribution function.
Does the possibility for automatic key redistribution exist in the
architecture or does the md5 hashing of keys provide a decent
probability that keys will be evenly distributed?

Given the current implementation, let's say you determine that your keys
aren't evenly distributed, thus you want to change your tokens, instead
of adding a new node.  When you issue the nodeprobe flush command, does
that disable all incoming write requests for that node?  If so, are read
requests also turned away or will the node continue to service reads
until the process is killed?

Are there any side effects from taking down an existing cluster,
changing the tokens and restarting, other than the redistribution of
data that will occur?

Thanks,
Robin.

On Tue, 2010-02-09 at 13:16 -0600, Stu Hood wrote:

> The 'Ring management' and 'Range changes' sections of the wiki have gotten a lot better
recently, and answer these questions. Specifically, look on that page for 'autobootstrap'.
> 
> http://wiki.apache.org/cassandra/Operations#Ring_management
> 
> Thanks,
> Stu
> 
> 
> -----Original Message-----
> From: "Robin Coe" <robin.coe@bluecoat.com>
> Sent: Tuesday, February 9, 2010 12:58pm
> To: cassandra-dev@incubator.apache.org
> Subject: Re: loadbalance and different strategies
> 
> Is it true that it is no longer necessary to specify an initial token?
> If so, how would you add a new node into a ring such that it guarantees
> replicas are spread evenly across data centres?  Is this achieved simply
> by starting a new node in the opposite DC and watching the log for the
> message that it's receiving requests, before bootstrapping the next
> node?  Or is it possible to bootstrap multiple nodes simultaneously
> around the cluster and let Cassandra figure out the replica distribution
> pattern?
> 
> I'm also curious about the distribution of keys across nodes.  The talk
> I've seen discusses how replicas are distributed around the cluster but
> since its the number of keys on a node that really governs its load,
> assuming all keys are retrieved with equal frequency, does the load
> balancer also function to redistribute keys amongst the nodes? 
> 
> Robin.
> 
> On Tue, 2010-02-09 at 10:21 -0600, Jonathan Ellis wrote:
> 
> > On Tue, Feb 9, 2010 at 3:13 AM, Jaakko <rosvopaallikko@gmail.com> wrote:
> > > What they probably should do, is to just
> > > consider nodes in the DC they are booting to, and try to balance load
> > > evenly in that DC.
> > 
> > I'm not sure what problem that would solve.  It seems to me there are two goals:
> > 
> >  1. don't transfer data across data centers
> >  2. improve ring balance when you add nodes
> > 
> > (1) should always be the case no matter where on the ring the node is
> > since there will be at least one replica of each range in each DC.
> > 
> > (2) is where we get into trouble here no matter which DC we add to.
> >  (a) if we add to G's DC, X will get all the replicas G has, remaining
> > unbalanced
> >  (b) if we add to the other DC, G will still be hit from all the
> > replicas from the other DC
> > 
> > So ISTM that the only real solution is to do what we say in the
> > Operations page, and make sure that nodes on the ring alternate DCs.
> > I don't think only considering nodes in the same DC helps with that.
> > 
> > -Jonathan
> 
> 
> 
> 






Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message