Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: error (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CAMR1f-e4QgTrHWbb47g3Zbxz4LDs8PAZzq9juAPA0K=ME4PWJA@mail.gmail.com>
References: 
 <CAMR1f-etXv8rCooaUuYZvuuZYuRkZz=khf9s8tZMJsrBs6Vadw@mail.gmail.com>
 <CALoo1W22hr=rjbxhwngcKesrS15sGXQLPF6+oYO6OQFbVR7Oug@mail.gmail.com>
 <CAMR1f-e9pa1Fgy=+ZhutJ2XnrYqF38U70gUqVusBjdTvg4LzhQ@mail.gmail.com>
 <CAAwnuDudjANY2AFba9xE5MMXcW9J3VYp6DNhLsstnU2E2jtRCA@mail.gmail.com>
 <CAMR1f-fPR_m+b3O=si0Y3gee2n82v7KdW0c16w0A=dALB8Uy0w@mail.gmail.com>
 <CA+VSrLoJtUWdtxjLUnhQoGkhVNUQAYBa0wjAPUZpY1aM+dqapw@mail.gmail.com>
 <CAMR1f-e4QgTrHWbb47g3Zbxz4LDs8PAZzq9juAPA0K=ME4PWJA@mail.gmail.com>
From: Richard Low <richard@wentnet.com>
Date: Tue, 11 Jun 2013 10:05:24 +0100
Message-ID: 
 <CAAwnuDtJ0b=ngPCxpEzhKvSKx=AfbhYDxXbi4nOr8X7bGOTGHg@mail.gmail.com>
Subject: Re: Why so many vnodes?
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=14dae9cdc033c85b2904dedd332e

--14dae9cdc033c85b2904dedd332e
Content-Type: text/plain; charset=ISO-8859-1

On 11 June 2013 09:54, Theo Hultberg <theo@iconara.net> wrote:

But in the paragraph just before Richard said that finding the node that
> owns a token becomes slower on large clusters with lots of token ranges, so
> increasing it further seems contradictory.
>

I do mean increase for larger clusters, but I guess it depends on what you
are optimizing for.  If you care about maintaining an even load, where
differences are measured relative to the amount of data each node has, then
you need T >> N.

However, you're right, this can slow down some operations.  Repair has a
fixed cost for each token so gets a bit slower with higher T.  Finding
which node owns a range gets harder with T but this code was optimized so I
don't think it will become a practical issue.

Is this a correct interpretation: finding the node that owns a particular
> token becomes slower as the number of nodes (and therefore total token
> ranges) increases, but for large clusters you also need to take the time
> for bootstraps into account, which will become slower if each node has
> fewer token ranges. The speed referred to in the two cases are the speeds
> of different operations, and there will be a trade off, and 256 initial
> tokens is a trade off that works for most cases.
>

Yes this is right.  The bootstraps may become slower because the node is
streaming from fewer original nodes (although it may only show on very busy
clusters, since otherwise bootstrap is limited by the joining node).  But
more importantly I think is that new nodes won't take an even share of the
data if T is too small.

Richard.

--14dae9cdc033c85b2904dedd332e
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra">On 11 June 2013 09:54, Theo Hul=
tberg <span dir=3D"ltr">&lt;<a href=3D"mailto:theo@iconara.net" target=3D"_=
blank">theo@iconara.net</a>&gt;</span> wrote:</div><div class=3D"gmail_extr=
a"><br><div class=3D"gmail_quote">

<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div dir=3D"ltr">But in the paragraph just before Richard =
said that finding the node that owns a token becomes slower on large cluste=
rs with lots of token ranges, so increasing it further seems contradictory.=
</div>

</blockquote><div><br></div>I do mean increase for larger clusters, but I g=
uess it depends on what you are optimizing for. =A0If you care about mainta=
ining an even load, where differences are measured relative to the amount o=
f data each node has, then you need T &gt;&gt; N.<div>

<br></div><div>However, you&#39;re right, this can slow down some operation=
s. =A0Repair has a fixed cost for each token so gets a bit slower with high=
er T. =A0Finding which node owns a range gets harder with T but this code w=
as optimized so I don&#39;t think it will become a practical issue.=A0</div=
>

<div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0p=
x 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-lef=
t-style:solid;padding-left:1ex"><div dir=3D"ltr"><div>
Is this a correct interpretation: finding the node that owns a particular t=
oken becomes slower as the number of nodes (and therefore total token range=
s) increases, but for large clusters you also need to take the time for boo=
tstraps into account, which will become slower if each node has fewer token=
 ranges. The speed referred to in the two cases are the speeds of different=
 operations, and there will be a trade off, and 256 initial tokens is a tra=
de off that works for most cases.</div>

</div></blockquote><div><br></div><div style>Yes this is right. =A0The boot=
straps may become slower because the node is streaming from fewer original =
nodes (although it may only show on very busy clusters, since otherwise boo=
tstrap is limited by the joining node). =A0But more importantly I think is =
that new nodes won&#39;t take an even share of the data if T is too small.<=
/div>

<div style><br></div><div style>Richard.</div></div></div></div>

--14dae9cdc033c85b2904dedd332e--