cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jais <>
Subject Re: Replication Factor question
Date Tue, 15 Apr 2014 08:37:44 GMT
Hi all,

thanks for your answers. Very helpful. We plan to use enough nodes so that the failure of
1 or 2 machines is no problem. E.g. for a workload to can be handled by 3 nodes all the time,
we would use at least 5, better 6 nodes to survive the failure of at least 2 nodes, even when
the 2 nodes fail at the same time. This should allow the cluster to rebuild the missing nodes
and still serve all requests with a RF=3 and Quorum reads.

All the best,


Tupshin Harper <> schrieb am 21:23 Montag, 14.April 2014:
tl;dr make sure you have enough capacity in the event of node failure. For light workloads,
that can be fulfilled with nodes=rf. 
>On Apr 14, 2014 2:35 PM, "Robert Coli" <> wrote:
>On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais <> wrote:
>>"It is generally not recommended to set a replication factor of 3 if you have fewer
than six nodes in a data center".
>>I have a detailed post about this somewhere in the archives of this list (which I
can't seem to find right now..) but briefly, the "6-for-3" advice relates to the percentage
of capacity you have remaining when you have a node down. It has become slightly less accurate
over time because vnodes reduce bootstrap time and there have been other improvements to node
startup time.
>>If you have fewer than 6 nodes with RF=3, you lose >1/6th of capacity when you
lose a single node, which is a significant percentage of total cluster capacity. You then
lose another meaningful percentage of your capacity when your existing nodes participate in
rebuilding the missing node. If you are then unlucky enough to lose another node, you are
missing a very significant percentage of your cluster capacity and have to use a relatively
small fraction of it to rebuild the now two down nodes.
>>I wouldn't generalize the rule of thumb as "don't run under N=RF*2", but rather as
"probably don't run RF=3 under about 6 nodes". IOW, in my view, the most operationally sane
initial number of nodes for RF=3 is likely closer to 6 than 3.
View raw message