cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brooke Thorley <>
Subject Re: 回复: tolerate how many nodes down in the cluster
Date Mon, 24 Jul 2017 22:26:35 GMT
Hello Peng.

I think spending the time to set up your nodes into racks is worth it for
the benefits that it brings. With RF3 and NTS you can tolerate the loss of
a whole rack of nodes without losing QUORUM as each rack will contain a
full set of data.  It makes ongoing cluster maintenance easier, as you can
perform upgrades, repairs and restarts on a whole rack of nodes at once.
Setting up racks or adding nodes is not difficult particularly if you are
using vnodes.  You would simply add nodes in multiples of <num racks> to
keep the racks balanced.  This is how we run all our managed clusters and
it works very well.

You may be interested to watch my Cassandra Summit presentation from last
year in which I discussed this very topic: (from 4:00)

If you were to consider changing your rack topology, I would recommend that
you do this by DC migration rather than "in place".

Kind Regards,
*Brooke Thorley*
*VP Technical Operations & Customer Services* |


<>   <>

Read our latest technical blog posts here

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

On 25 July 2017 at 03:06, Anuj Wadehra <>

> Hi Peng,
> Three things are important when you are evaluating fault tolerance and
> availability for your cluster:
> 1. RF
> 2. CL
> 3. Topology -  how data is replicated in racks.
> If you assume that N  nodes from ANY rack may fail at the same time,  then
> you can afford failure of RF-CL nodes and still be 100% available.  E. g.
> If you are reading at quorum and RF=3, you can only afford one (3-2) node
> failure. Thus, even if you have a 30 node cluster,  10 node failure can not
> provide you 100% availability. RF impacts availability rather than total
> number of nodes in a cluster.
> If you assume that N nodes failing together will ALWAYS be from the same
> rack,  you can spread your servers in RF physical racks and use
> NetworkTopologyStrategy. While allocating replicas for any data, Cassandra
> will ensure that 3 replicas are placed in 3 different racks E.g. you can
> have 10 nodes in 3 racks and then even a 10 node failure within SAME rack
> shall ensure that you have 100% availability as two replicas are there for
> 100% data and CL=QUORUM can be met. I have not tested this but that how the
> rack concept is expected to work.  I agree, using racks generally makes
> operations tougher.
> Thanks
> Anuj
> On Mon, 24 Jul 2017 at 20:10, Peng Xiao
> <> wrote:
> Hi Bhuvan,
> From the following link,it doesn't suggest us to use RAC and it looks
> reasonable.
> Defining one rack for the entire cluster is the simplest and most common
> implementation. Multiple racks should be avoided for the following reasons:
> • Most users tend to ignore or forget rack requirements that state racks
> should be in an alternating order to allow the data to get distributed
> safely and appropriately.
> • Many users are not using the rack information effectively by using a
> setup with as many racks as they have nodes, or similar non-beneficial
> scenarios.
> • When using racks correctly, each rack should typically have the same
> number of nodes.
> • In a scenario that requires a cluster expansion while using racks, the
> expansion procedure can be tedious since it typically involves several node
> moves and has has to ensure to ensure that racks will be distributing data
> correctly and evenly. At times when clusters need immediate expansion,
> racks should be the last things to worry about.
> ------------------ 原始邮件 ------------------
> *发件人:* "Bhuvan Rawal";<>;
> *发送时间:* 2017年7月24日(星期一) 晚上7:17
> *收件人:* "user"<>;
> *主题:* Re: tolerate how many nodes down in the cluster
> Hi Peng ,
> This really depends on how you have configured your topology. Say if you
> have segregated your dc into 3 racks with 10 servers each. With RF of 3 you
> can safely assume your data to be available if one rack goes down.
> But if different servers amongst the racks fail then i guess you are not
> guaranteeing data integrity with RF of 3 in that case you can at max lose 2
> servers to be available. Best idea would be to plan failover modes
> appropriately and letting cassandra know of the same.
> Regards,
> Bhuvan
> On Mon, Jul 24, 2017 at 3:28 PM, Peng Xiao <> wrote:
> Hi,
> Suppose we have a 30 nodes cluster in one DC with RF=3,
> how many nodes can be down?can we tolerate 10 nodes down?
> it seems that we are not able to avoid  the data distribution 3 replicas
> in the 10 nodes?,
> then we can only tolerate 1 node down even we have 30 nodes?
> Could anyone please advise?
> Thanks

View raw message