incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Why so many vnodes?
Date Tue, 11 Jun 2013 06:37:33 GMT
I think he actually meant *increase*, for this reason "For small T, a
random choice of initial tokens will in most cases give a poor distribution
of data.  The larger T is, the closer to uniform the distribution will be,
with increasing probability."

Alain


2013/6/11 Theo Hultberg <theo@iconara.net>

> thanks, that makes sense, but I assume in your last sentence you mean
> decrease it for large clusters, not increase it?
>
> T#
>
>
> On Mon, Jun 10, 2013 at 11:02 PM, Richard Low <richard@wentnet.com> wrote:
>
>> Hi Theo,
>>
>> The number (let's call it T and the number of nodes N) 256 was chosen to
>> give good load balancing for random token assignments for most cluster
>> sizes.  For small T, a random choice of initial tokens will in most cases
>> give a poor distribution of data.  The larger T is, the closer to uniform
>> the distribution will be, with increasing probability.
>>
>> Also, for small T, when a new node is added, it won't have many ranges to
>> split so won't be able to take an even slice of the data.
>>
>> For this reason T should be large.  But if it is too large, there are too
>> many slices to keep track of as you say.  The function to find which keys
>> live where becomes more expensive and operations that deal with individual
>> vnodes e.g. repair become slow.  (An extreme example is SELECT * LIMIT 1,
>> which when there is no data has to scan each vnode in turn in search of a
>> single row.  This is O(NT) and for even quite small T takes seconds to
>> complete.)
>>
>> So 256 was chosen to be a reasonable balance.  I don't think most users
>> will find it too slow; users with extremely large clusters may need to
>> increase it.
>>
>> Richard.
>>
>>
>> On 10 June 2013 18:55, Theo Hultberg <theo@iconara.net> wrote:
>>
>>> I'm not sure I follow what you mean, or if I've misunderstood what
>>> Cassandra is telling me. Each node has 256 vnodes (or tokens, as the
>>> prefered name seems to be). When I run `nodetool status` each node is
>>> reported as having 256 vnodes, regardless of how many nodes are in the
>>> cluster. A single node cluster has 256 vnodes on the single node, a six
>>> node cluster has 256 nodes on each machine, making 1590 vnodes in total.
>>> When I run `SELECT tokens FROM system.peers` or `nodetool ring` each node
>>> lists 256 tokens.
>>>
>>> This is different from how it works in Riak and Voldemort, if I'm not
>>> mistaken, and that is the source of my confusion.
>>>
>>> T#
>>>
>>>
>>> On Mon, Jun 10, 2013 at 4:54 PM, Milind Parikh <milindparikh@gmail.com>wrote:
>>>
>>>> There are n vnodes regardless of the size of the physical cluster.
>>>> Regards
>>>> Milind
>>>> On Jun 10, 2013 7:48 AM, "Theo Hultberg" <theo@iconara.net> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The default number of vnodes is 256, is there any significance in this
>>>>> number? Since Cassandra's vnodes don't work like for example Riak's,
where
>>>>> there is a fixed number of vnodes distributed evenly over the nodes,
why so
>>>>> many? Even with a moderately sized cluster you get thousands of slices.
>>>>> Does this matter? If your cluster grows to over thirty machines and you
>>>>> start looking at ten thousand slices, would that be a problem? I guess
trat
>>>>> traversing a list of a thousand or ten thousand slices to find where
a
>>>>> token lives isn't a huge problem, but are there any other up or downsides
>>>>> to having a small or large number of vnodes per node?
>>>>>
>>>>> I understand the benefits for splitting up the ring into pieces, for
>>>>> example to be able to stream data from more nodes when bootstrapping
a new
>>>>> one, but that works even if each node only has say 32 vnodes (unless
your
>>>>> cluster is truly huge).
>>>>>
>>>>> yours,
>>>>> Theo
>>>>>
>>>>
>>>
>>
>

Mime
View raw message