cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <>
Subject Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens
Date Sat, 08 Sep 2018 17:00:28 GMT
Virtual nodes accomplish two primary goals

1) it makes it easier to gradually add/remove capacity to your cluster by distributing the
new host capacity around the ring in smaller increments

2) it increases the number of sources for streaming, which speeds up bootstrap and decommission

Whether or not either of these actually is true depends on a number of factors, like your
cluster size (for #1) and your replication factor (for #2). If you have 4 hosts and 4 tokens
per host and add a 5th host, you’ll probably add a neighbor near each existing host (#1)
and stream from every other host (#2), so that’s great. If you have 20 hosts and add a new
host with 4 tokens, most of your existing ranges won’t change at all - you’re nominally
adding 5% of your cluster capacity but you won’t see a 5% improvement because you don’t
have enough tokens to move 5% of your ranges. If you had 32 tokens, you’d probably actually
see that 5% improvement, because you’d likely add a new range near each of the existing

Going down to 1 token would mean you’d probably need to manually move tokens after each
bootstrap to rebalance, which is fine, it just takes more operator awareness.

I don’t know how DSE calculates which replication factor to use for their token allocation
logic, maybe they guess or take the highest or something. Cassandra doesn’t - we require
you to be explicit, but we could probably do better here.

> On Sep 8, 2018, at 8:17 AM, Oleksandr Shulgin <> wrote:
>> On Sat, 8 Sep 2018, 14:47 Jonathan Haddad, <> wrote:
>> 256 tokens is a pretty terrible default setting especially post 3.0.  I recommend
folks use 4 tokens for new clusters,
> I wonder why not setting it to all the way down to 1 then? What's the key difference
once you have so few vnodes?
>> with some caveats.
> And those are?
>> When you fire up a cluster, there's no way to make the initial tokens be distributed
evenly, you'll get random ones.  You'll want to set them explicitly using:
>> python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])'
>> After you fire up the first seed, create a keyspace using RF=3 (or whatever you're
planning on using) and set allocate_tokens_for_keyspace to that keyspace in your config, and
join the rest of the nodes.  That gives even distribution.
> Do you possibly know if the DSE-style option which doesn't require a keyspace to be there
also works to allocate evenly distributed tokens for the very first seed node?
> Thanks,
> --
> Alex

View raw message