Thanks for your thoughts guys.
I agree that with vnodes total downtime is lessened. Although it also
seems that the total number of outages (however small) would be greater.
But I think downtime is only lessened up to a certain cluster size.
I'm thinking that as the cluster continues to grow:
 node rebuild time will max out (a single node only has so much write
bandwidth)
 the probability of 2 nodes being down at any given time will continue
to increase  even if you consider only noncorrelated failures.
Therefore, when adding nodes beyond the point where node rebuild time maxes
out, both the total number of outages *and* overall downtime would increase?
Thanks,
Eric
On Mon, Dec 10, 2012 at 7:00 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
> Assuming you need to work with quorum in a nonvnode scenario. That means
> that if 2 nodes in a row in the ring are down some number of quorum
> operations will fail with UnavailableException (TimeoutException right
> after the failures). This is because the for a given range of tokens quorum
> will be impossible, but quorum will be possible for others.
>
> In a vnode world if any two nodes are down, then the intersection of
> vnode token ranges they have are unavailable.
>
> I think it is two sides of the same coin.
>
>
> On Mon, Dec 10, 2012 at 7:41 AM, Richard Low <rlow@acunu.com> wrote:
>
>> Hi Tyler,
>>
>> You're right, the math does assume independence which is unlikely to be
>> accurate. But if you do have correlated failure modes e.g. same power,
>> racks, DC, etc. then you can still use Cassandra's rackaware or DCaware
>> features to ensure replicas are spread around so your cluster can survive
>> the correlated failure mode. So I would expect vnodes to improve uptime in
>> all scenarios, but haven't done the math to prove it.
>>
>> Richard.
>>
>
>
