Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of arodrime@gmail.com designates
 209.85.215.50 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAMR1f-fPR_m+b3O=si0Y3gee2n82v7KdW0c16w0A=dALB8Uy0w@mail.gmail.com>
References: 
 <CAMR1f-etXv8rCooaUuYZvuuZYuRkZz=khf9s8tZMJsrBs6Vadw@mail.gmail.com>
 <CALoo1W22hr=rjbxhwngcKesrS15sGXQLPF6+oYO6OQFbVR7Oug@mail.gmail.com>
 <CAMR1f-e9pa1Fgy=+ZhutJ2XnrYqF38U70gUqVusBjdTvg4LzhQ@mail.gmail.com>
 <CAAwnuDudjANY2AFba9xE5MMXcW9J3VYp6DNhLsstnU2E2jtRCA@mail.gmail.com>
 <CAMR1f-fPR_m+b3O=si0Y3gee2n82v7KdW0c16w0A=dALB8Uy0w@mail.gmail.com>
From: Alain RODRIGUEZ <arodrime@gmail.com>
Date: Tue, 11 Jun 2013 08:37:33 +0200
Message-ID: 
 <CA+VSrLoJtUWdtxjLUnhQoGkhVNUQAYBa0wjAPUZpY1aM+dqapw@mail.gmail.com>
Subject: Re: Why so many vnodes?
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a11c373600323c904dedb2305

--001a11c373600323c904dedb2305
Content-Type: text/plain; charset=ISO-8859-1

I think he actually meant *increase*, for this reason "For small T, a
random choice of initial tokens will in most cases give a poor distribution
of data.  The larger T is, the closer to uniform the distribution will be,
with increasing probability."

Alain


2013/6/11 Theo Hultberg <theo@iconara.net>

> thanks, that makes sense, but I assume in your last sentence you mean
> decrease it for large clusters, not increase it?
>
> T#
>
>
> On Mon, Jun 10, 2013 at 11:02 PM, Richard Low <richard@wentnet.com> wrote:
>
>> Hi Theo,
>>
>> The number (let's call it T and the number of nodes N) 256 was chosen to
>> give good load balancing for random token assignments for most cluster
>> sizes.  For small T, a random choice of initial tokens will in most cases
>> give a poor distribution of data.  The larger T is, the closer to uniform
>> the distribution will be, with increasing probability.
>>
>> Also, for small T, when a new node is added, it won't have many ranges to
>> split so won't be able to take an even slice of the data.
>>
>> For this reason T should be large.  But if it is too large, there are too
>> many slices to keep track of as you say.  The function to find which keys
>> live where becomes more expensive and operations that deal with individual
>> vnodes e.g. repair become slow.  (An extreme example is SELECT * LIMIT 1,
>> which when there is no data has to scan each vnode in turn in search of a
>> single row.  This is O(NT) and for even quite small T takes seconds to
>> complete.)
>>
>> So 256 was chosen to be a reasonable balance.  I don't think most users
>> will find it too slow; users with extremely large clusters may need to
>> increase it.
>>
>> Richard.
>>
>>
>> On 10 June 2013 18:55, Theo Hultberg <theo@iconara.net> wrote:
>>
>>> I'm not sure I follow what you mean, or if I've misunderstood what
>>> Cassandra is telling me. Each node has 256 vnodes (or tokens, as the
>>> prefered name seems to be). When I run `nodetool status` each node is
>>> reported as having 256 vnodes, regardless of how many nodes are in the
>>> cluster. A single node cluster has 256 vnodes on the single node, a six
>>> node cluster has 256 nodes on each machine, making 1590 vnodes in total.
>>> When I run `SELECT tokens FROM system.peers` or `nodetool ring` each node
>>> lists 256 tokens.
>>>
>>> This is different from how it works in Riak and Voldemort, if I'm not
>>> mistaken, and that is the source of my confusion.
>>>
>>> T#
>>>
>>>
>>> On Mon, Jun 10, 2013 at 4:54 PM, Milind Parikh <milindparikh@gmail.com>wrote:
>>>
>>>> There are n vnodes regardless of the size of the physical cluster.
>>>> Regards
>>>> Milind
>>>> On Jun 10, 2013 7:48 AM, "Theo Hultberg" <theo@iconara.net> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The default number of vnodes is 256, is there any significance in this
>>>>> number? Since Cassandra's vnodes don't work like for example Riak's, where
>>>>> there is a fixed number of vnodes distributed evenly over the nodes, why so
>>>>> many? Even with a moderately sized cluster you get thousands of slices.
>>>>> Does this matter? If your cluster grows to over thirty machines and you
>>>>> start looking at ten thousand slices, would that be a problem? I guess trat
>>>>> traversing a list of a thousand or ten thousand slices to find where a
>>>>> token lives isn't a huge problem, but are there any other up or downsides
>>>>> to having a small or large number of vnodes per node?
>>>>>
>>>>> I understand the benefits for splitting up the ring into pieces, for
>>>>> example to be able to stream data from more nodes when bootstrapping a new
>>>>> one, but that works even if each node only has say 32 vnodes (unless your
>>>>> cluster is truly huge).
>>>>>
>>>>> yours,
>>>>> Theo
>>>>>
>>>>
>>>
>>
>

--001a11c373600323c904dedb2305
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I think he actually meant *increase*, for this reason &quo=
t;<span style=3D"font-family:arial,sans-serif;font-size:13px">For small T, =
a random choice of initial tokens will in most cases give a poor distributi=
on of data. =A0The larger T is, the closer to uniform the distribution will=
 be, with increasing probability.&quot;</span><div>

<span style=3D"font-family:arial,sans-serif;font-size:13px"><br></span></di=
v><div style><span style=3D"font-family:arial,sans-serif;font-size:13px">Al=
ain</span></div></div><div class=3D"gmail_extra"><br><br><div class=3D"gmai=
l_quote">

2013/6/11 Theo Hultberg <span dir=3D"ltr">&lt;<a href=3D"mailto:theo@iconar=
a.net" target=3D"_blank">theo@iconara.net</a>&gt;</span><br><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex">

<div dir=3D"ltr">thanks, that makes sense, but I assume in your last senten=
ce you mean decrease it for large clusters, not increase it?<span class=3D"=
HOEnZb"><font color=3D"#888888"><div><br></div><div>T#</div></font></span><=
/div>

<div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><br><br>=
<div class=3D"gmail_quote">
On Mon, Jun 10, 2013 at 11:02 PM, Richard Low <span dir=3D"ltr">&lt;<a href=
=3D"mailto:richard@wentnet.com" target=3D"_blank">richard@wentnet.com</a>&g=
t;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0=
 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div dir=3D"ltr">Hi Theo,<div><br></div><div>The number (let&#39;s call it =
T and the number of nodes N) 256 was chosen to give good load balancing for=
 random token assignments for most cluster sizes. =A0For small T, a random =
choice of initial tokens will in most cases give a poor distribution of dat=
a. =A0The larger T is, the closer to uniform the distribution will be, with=
 increasing probability.</div>


<div><br></div><div>Also, for small T, when a new node is added, it won&#39=
;t have many ranges to split so won&#39;t be able to take an even slice of =
the data.</div><div><br></div><div>For this reason T should be large. =A0Bu=
t if it is too large, there are too many slices to keep track of as you say=
. =A0The function to find which keys live where becomes more expensive and =
operations that deal with individual vnodes e.g. repair become slow. =A0(An=
 extreme example is SELECT * LIMIT 1, which when there is no data has to sc=
an each vnode in turn in search of a single row. =A0This is O(NT) and for e=
ven quite small T takes seconds to complete.)</div>


<div><br></div><div>So 256 was chosen to be a reasonable balance. =A0I don&=
#39;t think most users will find it too slow; users with extremely large cl=
usters may need to increase it.</div><span><font color=3D"#888888"><div>
<br></div><div>

Richard.</div></font></span></div><div><div><div class=3D"gmail_extra"><br>=
<br><div class=3D"gmail_quote">On 10 June 2013 18:55, Theo Hultberg <span d=
ir=3D"ltr">&lt;<a href=3D"mailto:theo@iconara.net" target=3D"_blank">theo@i=
conara.net</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I&#39;m not sure I follow w=
hat you mean, or if I&#39;ve misunderstood what Cassandra is telling me. Ea=
ch node has 256 vnodes (or tokens, as the prefered name seems to be). When =
I run `nodetool status` each node is reported as having 256 vnodes, regardl=
ess of how many nodes are in the cluster. A single node cluster has 256 vno=
des on the single node, a six node cluster has 256 nodes on each machine, m=
aking 1590 vnodes in total. When I run `SELECT tokens FROM system.peers` or=
 `nodetool ring` each node lists 256 tokens.<div>


<br></div><div>This is different from how it works in Riak and Voldemort, i=
f I&#39;m not mistaken, and that is the source of my confusion.</div><span>=
<font color=3D"#888888"><div><br></div><div>T#</div></font></span></div>


<div><div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">
On Mon, Jun 10, 2013 at 4:54 PM, Milind Parikh <span dir=3D"ltr">&lt;<a hre=
f=3D"mailto:milindparikh@gmail.com" target=3D"_blank">milindparikh@gmail.co=
m</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margi=
n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<p dir=3D"ltr">There are n vnodes regardless of the size of the physical cl=
uster.<br>
Regards<span><font color=3D"#888888"><br>
Milind</font></span></p><div><div>
<div class=3D"gmail_quote">On Jun 10, 2013 7:48 AM, &quot;Theo Hultberg&quo=
t; &lt;<a href=3D"mailto:theo@iconara.net" target=3D"_blank">theo@iconara.n=
et</a>&gt; wrote:<br type=3D"attribution"><blockquote class=3D"gmail_quote"=
 style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div dir=3D"ltr"><div>Hi,</div><div><br></div>The default number of vnodes =
is 256, is there any significance in this number? Since Cassandra&#39;s vno=
des don&#39;t work like for example Riak&#39;s, where there is a fixed numb=
er of vnodes distributed evenly over the nodes, why so many? Even with a mo=
derately sized cluster you get thousands of slices. Does this matter? If yo=
ur cluster grows to over thirty machines and you start looking at ten thous=
and slices, would that be a problem? I guess trat traversing a list of a th=
ousand or ten thousand slices to find where a token lives isn&#39;t a huge =
problem, but are there any other up or downsides to having a small or large=
 number of vnodes per node?<div>


<br></div><div>I understand the benefits for splitting up the ring into pie=
ces, for example to be able to stream data from more nodes when bootstrappi=
ng a new one, but that works even if each node only has say 32 vnodes (unle=
ss your cluster is truly huge).</div>


<div><br></div><div>yours,</div><div>Theo</div></div>
</blockquote></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--001a11c373600323c904dedb2305--