You're right, the math does assume indepen= dence which is unlikely to be accurate. =A0But if you do have correlated fa= ilure modes e.g. same power, racks, DC, etc. then you can still use Cassand= ra's rack-aware or DC-aware features to ensure replicas are spread arou= nd so your cluster can survive the correlated failure mode. =A0So I would e= xpect vnodes to improve uptime in all scenarios, but haven't done the m= ath to prove it.

Richard.

On 9 December= 2012 17:50, Tyler Hobbs wrote:
Nicolas,

Strictly speaking, your math makes the assumption that the = failure of different nodes are probabilistically independent events. This i= s, of course, not a accurate assumption for real world conditions.=A0 Nodes= share racks, networking equipment, power, availability zones, data centers= , etc.=A0 So, I think the mathematical assertion is not quite as strong as = one would like, but it's certainly a good argument for handling certain= types of node failures.

On Fri, Dec 7, 2012 at 11:27 AM, Nicolas Favre-Felix wrote:
Hi Eric,

Your concerns ar= e perfectly valid.

We (Acunu) led the design and i= mplementation of this feature and spent a long time looking at the impact o= f such a large change.
We summarized some of our notes and wrote about the impact of virtual = nodes on cluster uptime a few months back:=A0http://www.acunu.com/2/post/2012/10/improving-cassandras-uptim= e-with-virtual-nodes.html.
The main argument in this blog post is that you only have a failure to= perform quorum read/writes if at least RF replicas fail within the time it= takes to rebuild the first dead node.=A0We show that virtual nodes actuall= y decrease the probability of failure, by streaming data from all nodes and= thereby improving the rebuild time.

Regards,

Nicolas

On Wed, Dec 5, 2012 at 4:45 PM, E= ric Parusel wrote:
Hi all,

I've been won= dering about virtual nodes and how cluster uptime might change as cluster s= ize increases.

I understand clusters will benefit from increased relia= bility due to faster rebuild time, but does that hold true for large cluste= rs?

It seems that since (and correct me if I'm wrong he= re) every physical node will likely share some small amount of data with ev= ery other node, that as the count of physical nodes in a Cassandra cluster = increases (let's say into the triple digits) that the probability of at= least one failure to Quorum read/write=A0occurring=A0in a given time perio= d=A0would *increase*. =A0

Would this hold true, at least until physical nodes bec= omes greater than num_tokens per node?

I under= stand that the window of failure for affected ranges would probably be smal= l but we do Quorum reads of many keys, so we'd likely hit every virtual= range with our queries, even if num_tokens was 256.

Thanks,
Eric

--
Tyl= er Hobbs
DataStax
<= br>

--
Richard Low<= br>Acunu | http://www.ac= unu.com | @acunu
--14dae9ccd52c90790c04d07ee207--