incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: How tokens work?
Date Mon, 01 Aug 2011 01:16:30 GMT
The recommended approach is for all nodes in a cassandra cluster to have the same HW spec.
If the do not then you need to treat every node as having the lowest possible spec (i.e. the
lowest memory, lowest CPU, lowest disk capacity and throughput). Other than during a HW upgrade,
running mixed HW nodes will make your life more complicated. 

The token (and so the token range) for a node is only part of the story. A node will store
rows that fall in it's token range. It will also store rows for RF-1 other token ranges. So
for RF 3 node D in your example is responsible for storing rows in the range for nodes B,
C and D. It is a replica for the ranges (0,20], (20, 70] and (70, 90]. 

Deliberately setting an unbalanced ring also sounds like a recipe for pain. 
 
Start out with a well balanced token ring and then make changes if your load testing or production
use shows it's necessary. If you are using mixed HW consider work of the lowest combined spec.

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 1 Aug 2011, at 12:24, Rafael Almeida wrote:

> On Saturday, July 30, 2011, Rafael Almeida <almeidaraf@yahoo.com> wrote:
>> Hello,
>>  
>> I have computers that are better than others in my cluster. In special,
>> there's one which is much better and I'd like to give it more load than the
>> others.  Is it possible? I'm using RandomPartitioner, should I use other?
>> Should I select tokens in some particular way? How is load distribution
>> implemented in RandomPartitioner with respect to tokens?
>>  
> 
> I'm answering myself this time. I think I've got things figured out, at least
> for RandomPartitioner. The token space goes from 0 to 2^217. There are 2^217
> tokens possible. The load a node will receive is proportional to the number of
> tokens assigned to it. If you assign 2^217 / 2 tokens to a node, it will be
> responsible for half the load in the system. If you assign 2^217 / 3 tokens to a
> node it will be responsible for 1/3 the load and so on. 
> 
> But you assign only one token in cassandra's configuration file! True, but
> that's the first token for that node, in a range of tokens it will accept. The
> number of tokens actually assigned to it is the range from the value you wrote
> in intiial_token in cassandra.yaml up to the next token.
> 
> I find it hard to explain that without an example. So, let's say the token space
> is actually from 0 to 100 and we have 4 nodes (let's do this in order to make
> things more manageble). In our example, we have the following initial_tokens:
> 
> node A = 0
> node B = 20
> node C = 70
> node D = 90
> 
> Node A would have 0 - 20 tokens assigned to it (20/100 = 20% of the load).  Node
> B would have 70 - 20 = 50 tokens assigned to it (50% of the load). Node C would
> have 90 - 70 = 20 tokens assigned to it (20% of the load) and, finally, node D
> would have 10% of the tokens assigned to it. See how that works? 
> 
> If you mess up in your configuration. Let's say you set up initial_token like
> this:
> 
> node A = 10
> node B = 20
> node C = 70
> node D = 90
> 
> That way you'd have 10 unhandled tokens. I think cassandra detects it and set
> things up in a way no token is missing. But I'm not sure what it does exactly.
> I've tested it with two nodes and, when I make such invalid configuration, I get
> each node handling 50% of the load.
> 
> I hope I've been clear. Please correct me if I misunderstood something.
> 


Mime
View raw message