cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Rebalance cluster
Date Sun, 15 Jan 2012 17:52:23 GMT
If you can handle the load without the three machines, and you are still meeting your redundancy
requirements removing them may make your life easier. Otherwise you have to consider that
your cluster is made up of machines with the worst parts from all of the nodes (i.e. lowest
memory, slowest cpu etc). 

Cheers
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 13/01/2012, at 5:42 AM, Daning Wang wrote:

> Thank you guys. very appreciated.
> 
> How about just pulling the slow machines out of cluster? I think the most of reads should
already from fast machine right now because of dynamic snitch. so removing two machines should
not add much loads on the remaining nodes.
> 
> How do you think?
> 
> Thanks,
> 
> Daning
> 
> On Wed, Jan 11, 2012 at 1:34 PM, Antonio Martinez <antypdet@gmail.com> wrote:
> There is another possible approach that I reference from the original Dynamo paper. Instead
of trying to manage a heterogeneous cluster at the cassandra level, it might be possible to
take the approach Amazon took. Find the smallest common denominator of resource for your nodes(most
likely your smallest node) and virtualize the others to that level. For example, say you have
3 physical computers, one with one processor and 2gb of memory, one with 2 processors and
4gb, and one with 4 and 8gb. You could make the smallest one your basic block and then put
two one processor 2gb vm's on the second machine and 4 of those on the third and largest machine.
Then instead of managing the three of them separately and worrying about them being different
you instead manage a ring of 7 equal nodes with equal portions of the ring. This allows you
to give smaller machines a lesser load compared to the more powerful ones. The amazon paper
on dynamo has more information on how they did it and some of the tricks they use for reliability.
 http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
> 
> Hope this helps somewhat
> 
> On Wed, Jan 11, 2012 at 2:00 PM, aaron morton <aaron@thelastpickle.com> wrote:
> I have good news and bad. 
> 
> The good news is I have a nice coffee. The bad news is it's pretty difficult to have
some nodes with less load. 
> 
> In a cluster with 5 nodes and RF 3 each node holds the following token ranges. 
> 
> node1: node 1, 5 and 4
> node 2: node 2, 1, 5
> node 3: node 3, 2, 1
> node 4: node 4, 3, 2
> node 5: node 5, 4, 3
> 
> The load on each node is it's token range, and those of the preceding RF-1 nodes. e.g.
In a balanced ring of 5 nodes with RF 3 each node has 20 % of the token ring and 60% of the
total load. 
> 
> if you split the token ring is split like this below each node has the total load shown
after the /
> 
> node 1: 12.5 %  / 50%
> node 2: 25 % / 62.5%
> node 3:  25 % / 62.5%
> node 4: 12.5 % / 62.5%
> node 5: 25% / 62.5 %
> 
> Only node 1 gets a small amount less. Try a different approach…
> 
> node 1: 12.5 %  / 62.5%
> node 2: 12.5 % / 50%
> node 3: 25 % / 50%
> node 4: 25 % / 62.5%
> node 5: 25 % / 75.5 %
> 
> That's even worse. 
> 
> David is right to use nodetool move. It's a good idea to update the initial tokens in
the yaml (or your ops condif) after the fact even though they are not used. 
> 
> Hope that helps.
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 12/01/2012, at 8:41 AM, David McNelis wrote:
> 
>> Daning,
>> 
>> You can see how to do this basic sort of thing on the Wiki's operations page ( http://wiki.apache.org/cassandra/Operations
)
>> 
>> In short, you'll want to run:
>> nodetool -h hostname move newtoken
>> 
>> Then, once you've update each of your tokens that you want to move, you'll want to
run
>> nodetool -h hostname cleanup
>> 
>> That will remove the no-longer necessary tokens from your smaller machines.
>> 
>> Please note that someone else may have some better insights than I into whether or
not  your strategy is going to be effective.  On the surface I think what you are doing is
logical, but I'm unsure of the  actual performance gains you'll see.
>> 
>> David
>> 
>> On Wed, Jan 11, 2012 at 1:32 PM, Daning Wang <daning@netseer.com> wrote:
>> Hi All,
>> 
>> We have 5 nodes cluster(on 0.8.6), but two machines are slower and have less memory,
so the performance was not good  on those two machines for large volume traffic.I want to
move some data from slower machine to faster machine to ease some load, the token ring will
not be equally balanced.
>> 
>> I am thinking the following steps,
>> 
>> 1. modify cassandra.yaml to change the initial token.
>> 2. restart cassandra(don't need to auto-bootstrap, right?)
>> 3. then run nodetool repair,(or nodetool move?, not sure which one to use)
>> 
>> 
>> Is there any doc that has detailed steps about how to do this?
>> 
>> Thanks in advance,
>> 
>> Daning
>> 
>> 
> 
> 
> 
> 
> -- 
> Antonio Perez de Tejada Martinez
> 
> 


Mime
View raw message