On Mon, Aug 1, 2011 at 8:24 AM, Rafael Almeida <almeidaraf@yahoo.com> wrote:
On Saturday, July 30, 2011, Rafael Almeida <almeidaraf@yahoo.com> wrote:
> Hello,

> I have computers that are better than others in my cluster. In special,
> there's one which is much better and I'd like to give it more load than the
> others.  Is it possible? I'm using RandomPartitioner, should I use other?
> Should I select tokens in some particular way? How is load distribution
> implemented in RandomPartitioner with respect to tokens?


I'm answering myself this time. I think I've got things figured out, at least
for RandomPartitioner. The token space goes from 0 to 2^217. There are 2^217
tokens possible. The load a node will receive is proportional to the number of
tokens assigned to it. If you assign 2^217 / 2 tokens to a node, it will be
responsible for half the load in the system. If you assign 2^217 / 3 tokens to a
node it will be responsible for 1/3 the load and so on. 

But you assign only one token in cassandra's configuration file! True, but
that's the first token for that node, in a range of tokens it will accept. The
number of tokens actually assigned to it is the range from the value you wrote
in intiial_token in cassandra.yaml up to the next token.

I find it hard to explain that without an example. So, let's say the token space
is actually from 0 to 100 and we have 4 nodes (let's do this in order to make
things more manageble). In our example, we have the following initial_tokens:

node A = 0
node B = 20
node C = 70
node D = 90

Node A would have 0 - 20 tokens assigned to it (20/100 = 20% of the load).  Node
B would have 70 - 20 = 50 tokens assigned to it (50% of the load). Node C would
have 90 - 70 = 20 tokens assigned to it (20% of the load) and, finally, node D
would have 10% of the tokens assigned to it. See how that works? 
 
If you mess up in your configuration. Let's say you set up initial_token like
this:

node A = 10
node B = 20
node C = 70
node D = 90

That way you'd have 10 unhandled tokens. I think cassandra detects it and set
things up in a way no token is missing. But I'm not sure what it does exactly.
I've tested it with two nodes and, when I make such invalid configuration, I get
each node handling 50% of the load.

 
There would be no missing token, node A will take care of token range (90, 100] and [0, 10]. 

I hope I've been clear. Please correct me if I misunderstood something.