> Is it possible to configure or write a snitch that would create separate distribution
zones within the cluster? (e.g. 144 nodes in cluster, split into 12 zones. Data stored to
node 1 could only be replicated to one of 11 other nodes in the same distribution zone).
This is kind of what NTS does if you have nodes in different racks.
A replica is placed in each rack, and the process wraps around and continues until RF replicas
are located. If the number of racks is not equal to the RF you then get some unevenness (how
what do you know, that's a real word :) )
Cheers

Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 12/12/2012, at 6:42 AM, Eric Parusel <ericparusel@gmail.com> wrote:
> Ok, thanks Richard. That's good to hear.
>
> However, I still contend that as node count increases to infinity, the probability of
there being at least two node failures in the cluster at any time would increase to 100%.
>
> I think of this as somewhat analogous to RAID  I would not be comfortable with a 144+
disk RAID 6 array, no matter the rebuild speed :)
>
> Is it possible to configure or write a snitch that would create separate distribution
zones within the cluster? (e.g. 144 nodes in cluster, split into 12 zones. Data stored to
node 1 could only be replicated to one of 11 other nodes in the same distribution zone).
>
>
> On Tue, Dec 11, 2012 at 3:24 AM, Richard Low <rlow@acunu.com> wrote:
> Hi Eric,
>
> The time to recover one node is limited by that node, but the time to recover that's
most important is just the time to replicate the data that is missing from that node. This
is the removetoken operation (called removenode in 1.2), and this gets faster the more nodes
you have.
>
> Richard.
>
>
> On 11 December 2012 08:39, Eric Parusel <ericparusel@gmail.com> wrote:
> Thanks for your thoughts guys.
>
> I agree that with vnodes total downtime is lessened. Although it also seems that the
total number of outages (however small) would be greater.
>
> But I think downtime is only lessened up to a certain cluster size.
>
> I'm thinking that as the cluster continues to grow:
>  node rebuild time will max out (a single node only has so much write bandwidth)
>  the probability of 2 nodes being down at any given time will continue to increase
 even if you consider only noncorrelated failures.
>
> Therefore, when adding nodes beyond the point where node rebuild time maxes out, both
the total number of outages *and* overall downtime would increase?
>
> Thanks,
> Eric
>
>
>
>
> On Mon, Dec 10, 2012 at 7:00 AM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
> Assuming you need to work with quorum in a nonvnode scenario. That means that if 2 nodes
in a row in the ring are down some number of quorum operations will fail with UnavailableException
(TimeoutException right after the failures). This is because the for a given range of tokens
quorum will be impossible, but quorum will be possible for others.
>
> In a vnode world if any two nodes are down, then the intersection of vnode token ranges
they have are unavailable.
>
> I think it is two sides of the same coin.
>
>
> On Mon, Dec 10, 2012 at 7:41 AM, Richard Low <rlow@acunu.com> wrote:
> Hi Tyler,
>
> You're right, the math does assume independence which is unlikely to be accurate. But
if you do have correlated failure modes e.g. same power, racks, DC, etc. then you can still
use Cassandra's rackaware or DCaware features to ensure replicas are spread around so your
cluster can survive the correlated failure mode. So I would expect vnodes to improve uptime
in all scenarios, but haven't done the math to prove it.
>
> Richard.
>
>
>
>
>
> 
> Richard Low
> Acunu  http://www.acunu.com  @acunu
>
