cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <>
Subject Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?
Date Tue, 20 Feb 2018 20:39:21 GMT
At a past job, we set the limit at around 60 hosts per cluster - anything
bigger than that got single token. Anything smaller, and we'd just tolerate
the inconveniences of vnodes. But that was before the new vnode token
allocation went into 3.0, and really assumed things that may not be true
for you (it was a cluster that started at 60 hosts and grew up to 480 in
steps, so we'd want to grow quickly - having single token allowed us to
grow from 60-120 in 2 days, and then 120-180 in 2 days, and so on).

Are you always going to be growing, or is it a short/temporary thing?
There are users of vnodes (at big, public companies) that go up into the
hundreds of nodes.

Most people running cassandra start sharding clusters rather than going
past a thousand or so nodes - I know there's at least one person I talked
to in IRC with a 1700 host cluster, but that'd be beyond what I'd ever do

On Tue, Feb 20, 2018 at 12:34 PM, Jürgen Albersdorfer <> wrote:

> Thanks Jeff,
> your answer is really not what I expected to learn - which is again more
> manual doing as soon as we start really using C*. But I‘m happy to be able
> to learn it now and have still time to learn the neccessary Skills and ask
> the right questions on how to correctly drive big data with C* until we
> actually start using it, and I‘m glad to have People like you around caring
> about this questions. Thanks. This still convinces me having bet on the
> right horse, even when it might become a rough ride.
> By the way, is it possible to migrate towards to smaller token ranges?
> What is the recommended way doing so? And which number of nodes is the
> typical ‚break even‘?
> Von meinem iPhone gesendet
> Am 20.02.2018 um 21:05 schrieb Jeff Jirsa <>:
> The scenario you describe is the typical point where people move away from
> vnodes and towards single-token-per-node (or a much smaller number of
> vnodes).
> The default setting puts you in a situation where virtually all hosts are
> adjacent/neighbors to all others (at least until you're way into the
> hundreds of hosts), which means you'll stream from nearly all hosts. If you
> drop the number of vnodes from ~256 to ~4 or ~8 or ~16, you'll see the
> number of streams drop as well.
> Many people with "large" clusters statically allocate tokens to make it
> predictable - if you have a single token per host, you can add multiple
> hosts at a time, each streaming from a small number of neighbors, without
> overlap.
> It takes a bit more tooling (or manual token calculation) outside of
> cassandra, but works well in practice for "large" clusters.
> On Tue, Feb 20, 2018 at 4:42 AM, Jürgen Albersdorfer <
>> wrote:
>> Hi, I'm wondering if it is possible resp. would it make sense to limit
>> concurrent streaming when joining a new node to cluster.
>> I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining
>> another Node every day.
>> The 'nodetool netstats' shows it always streams data from all other nodes.
>> How far will this scale? - What happens when I have hundrets or even
>> thousends of Nodes?
>> Has anyone experience with such a Situation?
>> Thanks, and regards
>> Jürgen

View raw message