incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: cassandra nodes with mixed hard disk sizes
Date Tue, 22 Mar 2011 11:21:26 GMT
My assumption is from not seeing anything in the code to explicitly support nodes of different
specs (also think I saw it somewhere ages ago). AFAIK the dynamic snitch is there to detect
nodes with a temporarily reduced throughput and try to reduce the read load on them. 

I may be wrong on this, so anyone else feel free to jump in. Here are some issues to consider...

- keyspace memory requirements are global, all nodes must have enough memory to support the
CFs.
- During node moves, additions or deletions the token range may increase, nodes with less
total  space than others would make this more complicated.
- during a write the mutation is sent to all replicas, a weak node that is a replica for a
strong and busy node will be asked to store data from the strong node.
- read repair reads from all replicas
- when strong nodes that replicate to a weak node are compacting or repairing the dynamic
snitch may order them lower than the weak node. Potentially increasing read requests on the
weak one.
- down time for a strong node (or cluster partition) may result in increased read traffic
to a weak node if all up replicas are needed to achieve the CL.
- nodes store their token range and the token range for RF-1 other nodes.

Overall when a node goes down other nodes need to be able to handle the potential extra load
(connections, reads, storing HH). If you have some weak and some strong nodes there is a chance
of the weak nodes been overwhelmed which may reduce the availability of your cluster.

Hope that helps.
Aaron

On 22/03/2011, at 10:54 PM, Daniel Doubleday <daniel.doubleday@gmx.net> wrote:

> 
> On Mar 22, 2011, at 5:09 AM, aaron morton wrote:
>> 1) You should use nodes with the same capacity (CPU, RAM, HDD), cassandra assumes
they are all equal. 
> 
> Care to elaborate? While equal node will certainly make life easier I would have thought
that  dynamic snitch would take care of performance differences and manual assignment of token
ranges can yield to any data distribution. Obviously if a node has twices  as much data will
probably get twice the load. But if that is no problem ...
> 
> Where does cassandra assume that all are equal?  
> 
> Cheers Daniel
> 
> 
>> 
>> 2) Not sure what exactly would happen. Am guessing either the node would shutdown
or writes would eventually block, probably the former. If the node was up read performance
may suffer (if there were more writes been sent in). If you really want to know more let me
know and I may find time to dig into it. 
>> 
>> Also a node is be responsible for storing it's token range and acting as a replica
for other token ranges. So reducing the token range may not have a dramatic affect on the
storage requirements. 
>> 
>> Hope that helps. 
>> Aaron
>> 
>> On 22 Mar 2011, at 09:50, Jonathan Colby wrote:
>> 
>>> 
>>> This is a two part question ...
>>> 
>>> 1. If you have cassandra nodes with different sized hard disks,  how do you deal
with assigning the token ring such that the nodes with larger disks get more data?   In other
words, given equally distributed token ranges, when the smaller disk nodes run out of space,
the larger disk nodes with still have unused capacity.    Or is installing a mixed hardware
cluster a no-no?
>>> 
>>> 2. What happens when a cassandra node runs out of disk space for its data files?
 Does it continue serving the data while not accepting new data?  Or does the node break and
require manual intervention?
>>> 
>>> This info has alluded me elsewhere.
>>> Jon
>> 
> 


Mime
View raw message