incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Theo Hultberg <>
Subject Re: Why so many vnodes?
Date Tue, 11 Jun 2013 05:27:01 GMT
thanks, that makes sense, but I assume in your last sentence you mean
decrease it for large clusters, not increase it?


On Mon, Jun 10, 2013 at 11:02 PM, Richard Low <> wrote:

> Hi Theo,
> The number (let's call it T and the number of nodes N) 256 was chosen to
> give good load balancing for random token assignments for most cluster
> sizes.  For small T, a random choice of initial tokens will in most cases
> give a poor distribution of data.  The larger T is, the closer to uniform
> the distribution will be, with increasing probability.
> Also, for small T, when a new node is added, it won't have many ranges to
> split so won't be able to take an even slice of the data.
> For this reason T should be large.  But if it is too large, there are too
> many slices to keep track of as you say.  The function to find which keys
> live where becomes more expensive and operations that deal with individual
> vnodes e.g. repair become slow.  (An extreme example is SELECT * LIMIT 1,
> which when there is no data has to scan each vnode in turn in search of a
> single row.  This is O(NT) and for even quite small T takes seconds to
> complete.)
> So 256 was chosen to be a reasonable balance.  I don't think most users
> will find it too slow; users with extremely large clusters may need to
> increase it.
> Richard.
> On 10 June 2013 18:55, Theo Hultberg <> wrote:
>> I'm not sure I follow what you mean, or if I've misunderstood what
>> Cassandra is telling me. Each node has 256 vnodes (or tokens, as the
>> prefered name seems to be). When I run `nodetool status` each node is
>> reported as having 256 vnodes, regardless of how many nodes are in the
>> cluster. A single node cluster has 256 vnodes on the single node, a six
>> node cluster has 256 nodes on each machine, making 1590 vnodes in total.
>> When I run `SELECT tokens FROM system.peers` or `nodetool ring` each node
>> lists 256 tokens.
>> This is different from how it works in Riak and Voldemort, if I'm not
>> mistaken, and that is the source of my confusion.
>> T#
>> On Mon, Jun 10, 2013 at 4:54 PM, Milind Parikh <>wrote:
>>> There are n vnodes regardless of the size of the physical cluster.
>>> Regards
>>> Milind
>>> On Jun 10, 2013 7:48 AM, "Theo Hultberg" <> wrote:
>>>> Hi,
>>>> The default number of vnodes is 256, is there any significance in this
>>>> number? Since Cassandra's vnodes don't work like for example Riak's, where
>>>> there is a fixed number of vnodes distributed evenly over the nodes, why
>>>> many? Even with a moderately sized cluster you get thousands of slices.
>>>> Does this matter? If your cluster grows to over thirty machines and you
>>>> start looking at ten thousand slices, would that be a problem? I guess trat
>>>> traversing a list of a thousand or ten thousand slices to find where a
>>>> token lives isn't a huge problem, but are there any other up or downsides
>>>> to having a small or large number of vnodes per node?
>>>> I understand the benefits for splitting up the ring into pieces, for
>>>> example to be able to stream data from more nodes when bootstrapping a new
>>>> one, but that works even if each node only has say 32 vnodes (unless your
>>>> cluster is truly huge).
>>>> yours,
>>>> Theo

View raw message