Thanks for your thoughts guys.
I agree that with vnodes total downtime is lessened. Although it also
seems that the total number of outages (however small) would be greater.
But I think downtime is only lessened up to a certain cluster size.
I'm thinking that as the cluster continues to grow:
- node rebuild time will max out (a single node only has so much write
bandwidth)
- the probability of 2 nodes being down at any given time will continue
to increase -- even if you consider only non-correlated failures.
Therefore, when adding nodes beyond the point where node rebuild time maxes
out, both the total number of outages *and* overall downtime would increase?
Thanks,
Eric
On Mon, Dec 10, 2012 at 7:00 AM, Edward Capriolo wrote:
> Assuming you need to work with quorum in a non-vnode scenario. That means
> that if 2 nodes in a row in the ring are down some number of quorum
> operations will fail with UnavailableException (TimeoutException right
> after the failures). This is because the for a given range of tokens quorum
> will be impossible, but quorum will be possible for others.
>
> In a vnode world if any two nodes are down, then the intersection of
> vnode token ranges they have are unavailable.
>
> I think it is two sides of the same coin.
>
>
> On Mon, Dec 10, 2012 at 7:41 AM, Richard Low wrote:
>
>> Hi Tyler,
>>
>> You're right, the math does assume independence which is unlikely to be
>> accurate. But if you do have correlated failure modes e.g. same power,
>> racks, DC, etc. then you can still use Cassandra's rack-aware or DC-aware
>> features to ensure replicas are spread around so your cluster can survive
>> the correlated failure mode. So I would expect vnodes to improve uptime in
>> all scenarios, but haven't done the math to prove it.
>>
>> Richard.
>>
>
>