Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of rlow@acunu.com designates
 209.85.212.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAAam9supQrtwEasWTOHy3LahqZZta6MFxtPh4Nf8+ob2L6baog@mail.gmail.com>
References: 
 <CAEXp8=xywukKOo0AT+HnMJF0Fkad3S=-HPQtfEbjCm469xfXVg@mail.gmail.com>
 <CAOG6zs0-dinHZdMaa21WuB1C74AEkWD2kfV6dS3Le_WNWw=zgA@mail.gmail.com>
 <CAAam9supQrtwEasWTOHy3LahqZZta6MFxtPh4Nf8+ob2L6baog@mail.gmail.com>
From: Richard Low <rlow@acunu.com>
Date: Mon, 10 Dec 2012 12:41:11 +0000
Message-ID: 
 <CAEONjQLUoNRXT2fQPb9eHtoorTgrZR9xoPnK8HG+Ct0n3QoXHQ@mail.gmail.com>
Subject: Re: Virtual Nodes, lots of physical nodes and potentially increasing
 outage count?
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=14dae9ccd52c90790c04d07ee207

--14dae9ccd52c90790c04d07ee207
Content-Type: text/plain; charset=ISO-8859-1

Hi Tyler,

You're right, the math does assume independence which is unlikely to be
accurate.  But if you do have correlated failure modes e.g. same power,
racks, DC, etc. then you can still use Cassandra's rack-aware or DC-aware
features to ensure replicas are spread around so your cluster can survive
the correlated failure mode.  So I would expect vnodes to improve uptime in
all scenarios, but haven't done the math to prove it.

Richard.


On 9 December 2012 17:50, Tyler Hobbs <tyler@datastax.com> wrote:

> Nicolas,
>
> Strictly speaking, your math makes the assumption that the failure of
> different nodes are probabilistically independent events. This is, of
> course, not a accurate assumption for real world conditions.  Nodes share
> racks, networking equipment, power, availability zones, data centers, etc.
> So, I think the mathematical assertion is not quite as strong as one would
> like, but it's certainly a good argument for handling certain types of node
> failures.
>
>
> On Fri, Dec 7, 2012 at 11:27 AM, Nicolas Favre-Felix <nicolas@acunu.com>wrote:
>
>> Hi Eric,
>>
>> Your concerns are perfectly valid.
>>
>> We (Acunu) led the design and implementation of this feature and spent a
>> long time looking at the impact of such a large change.
>> We summarized some of our notes and wrote about the impact of virtual
>> nodes on cluster uptime a few months back:
>> http://www.acunu.com/2/post/2012/10/improving-cassandras-uptime-with-virtual-nodes.html
>> .
>> The main argument in this blog post is that you only have a failure to
>> perform quorum read/writes if at least RF replicas fail within the time it
>> takes to rebuild the first dead node. We show that virtual nodes actually
>> decrease the probability of failure, by streaming data from all nodes and
>> thereby improving the rebuild time.
>>
>> Regards,
>>
>> Nicolas
>>
>>
>> On Wed, Dec 5, 2012 at 4:45 PM, Eric Parusel <ericparusel@gmail.com>wrote:
>>
>>> Hi all,
>>>
>>> I've been wondering about virtual nodes and how cluster uptime might
>>> change as cluster size increases.
>>>
>>> I understand clusters will benefit from increased reliability due to
>>> faster rebuild time, but does that hold true for large clusters?
>>>
>>> It seems that since (and correct me if I'm wrong here) every physical
>>> node will likely share some small amount of data with every other node,
>>> that as the count of physical nodes in a Cassandra cluster increases (let's
>>> say into the triple digits) that the probability of at least one failure to
>>> Quorum read/write occurring in a given time period would *increase*.
>>>
>>> Would this hold true, at least until physical nodes becomes greater than
>>> num_tokens per node?
>>>
>>> I understand that the window of failure for affected ranges would
>>> probably be small but we do Quorum reads of many keys, so we'd likely hit
>>> every virtual range with our queries, even if num_tokens was 256.
>>>
>>> Thanks,
>>> Eric
>>>
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>
>


-- 
Richard Low
Acunu | http://www.acunu.com | @acunu

--14dae9ccd52c90790c04d07ee207
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Tyler,<div><br></div><div>You&#39;re right, the math does assume indepen=
dence which is unlikely to be accurate. =A0But if you do have correlated fa=
ilure modes e.g. same power, racks, DC, etc. then you can still use Cassand=
ra&#39;s rack-aware or DC-aware features to ensure replicas are spread arou=
nd so your cluster can survive the correlated failure mode. =A0So I would e=
xpect vnodes to improve uptime in all scenarios, but haven&#39;t done the m=
ath to prove it.</div>

<div><br></div><div>Richard.</div>
<div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On 9 December=
 2012 17:50, Tyler Hobbs <span dir=3D"ltr">&lt;<a href=3D"mailto:tyler@data=
stax.com" target=3D"_blank">tyler@datastax.com</a>&gt;</span> wrote:<br><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #=
ccc solid;padding-left:1ex">

Nicolas,<br><br>Strictly speaking, your math makes the assumption that the =
failure of different nodes are probabilistically independent events. This i=
s, of course, not a accurate assumption for real world conditions.=A0 Nodes=
 share racks, networking equipment, power, availability zones, data centers=
, etc.=A0 So, I think the mathematical assertion is not quite as strong as =
one would like, but it&#39;s certainly a good argument for handling certain=
 types of node failures.<br>


<div class=3D"gmail_extra"><div><div class=3D"h5"><br><br><div class=3D"gma=
il_quote">On Fri, Dec 7, 2012 at 11:27 AM, Nicolas Favre-Felix <span dir=3D=
"ltr">&lt;<a href=3D"mailto:nicolas@acunu.com" target=3D"_blank">nicolas@ac=
unu.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hi Eric,<div><br></div><div>Your concerns ar=
e perfectly valid.</div><div><br></div><div>We (Acunu) led the design and i=
mplementation of this feature and spent a long time looking at the impact o=
f such a large change.</div>


<div>We summarized some of our notes and wrote about the impact of virtual =
nodes on cluster uptime a few months back:=A0<a href=3D"http://www.acunu.co=
m/2/post/2012/10/improving-cassandras-uptime-with-virtual-nodes.html" targe=
t=3D"_blank">http://www.acunu.com/2/post/2012/10/improving-cassandras-uptim=
e-with-virtual-nodes.html</a>.</div>


<div>The main argument in this blog post is that you only have a failure to=
 perform quorum read/writes if at least RF replicas fail within the time it=
 takes to rebuild the first dead node.=A0We show that virtual nodes actuall=
y decrease the probability of failure, by streaming data from all nodes and=
 thereby improving the rebuild time.</div>


<div><br></div><div>Regards,</div><div><br></div><div>Nicolas</div><div><di=
v><div><br><br><div class=3D"gmail_quote">On Wed, Dec 5, 2012 at 4:45 PM, E=
ric Parusel <span dir=3D"ltr">&lt;<a href=3D"mailto:ericparusel@gmail.com" =
target=3D"_blank">ericparusel@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hi all,<div><br></div><div>I&#39;ve been won=
dering about virtual nodes and how cluster uptime might change as cluster s=
ize increases.</div>


<div><br></div><div>I understand clusters will benefit from increased relia=
bility due to faster rebuild time, but does that hold true for large cluste=
rs?</div>
<div><br></div><div>It seems that since (and correct me if I&#39;m wrong he=
re) every physical node will likely share some small amount of data with ev=
ery other node, that as the count of physical nodes in a Cassandra cluster =
increases (let&#39;s say into the triple digits) that the probability of at=
 least one failure to Quorum read/write=A0occurring=A0in a given time perio=
d=A0would *increase*. =A0</div>


<div><br></div><div>Would this hold true, at least until physical nodes bec=
omes greater than num_tokens per node?<br></div><div><br></div><div>I under=
stand that the window of failure for affected ranges would probably be smal=
l but we do Quorum reads of many keys, so we&#39;d likely hit every virtual=
 range with our queries, even if num_tokens was 256.</div>


<div><br></div><div>Thanks,</div><div>Eric</div>
</blockquote></div><br></div>
</div></div></blockquote></div><br><br clear=3D"all"><br></div></div><span =
class=3D"HOEnZb"><font color=3D"#888888">-- <br><font color=3D"#888888">Tyl=
er Hobbs<span></span><br>
<a href=3D"http://datastax.com/" target=3D"_blank">DataStax</a><br></font><=
br>
</font></span></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>Richard Low<=
br>Acunu | <a href=3D"http://www.acunu.com" target=3D"_blank">http://www.ac=
unu.com</a> | @acunu<br>
</div>

--14dae9ccd52c90790c04d07ee207--