cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jürgen Albersdorfer <>
Subject Re: Is it possible / makes it sense to limit concurrent streaming during bootstrapping new nodes?
Date Tue, 20 Feb 2018 21:09:36 GMT
We do archiving data in Order to make assumptions on it in future. So, yes we expect to grow
continously. In the mean time I learned to go for predictable grow per partition rather than
unpredictable large partitioning. So today we are growing 250.000.000 Records per Day going
into a single table and heading towards to about 100 times that number this year. A Partition
will grow one Record a Day, which should give us good horizontal scaleability, but means 250.000.000
to partitions. Hope this Numbers should not make me feel uncomfortable :)

Von meinem iPhone gesendet

> Am 20.02.2018 um 21:39 schrieb Jeff Jirsa <>:
> At a past job, we set the limit at around 60 hosts per cluster - anything bigger than
that got single token. Anything smaller, and we'd just tolerate the inconveniences of vnodes.
But that was before the new vnode token allocation went into 3.0, and really assumed things
that may not be true for you (it was a cluster that started at 60 hosts and grew up to 480
in steps, so we'd want to grow quickly - having single token allowed us to grow from 60-120
in 2 days, and then 120-180 in 2 days, and so on).
> Are you always going to be growing, or is it a short/temporary thing?
> There are users of vnodes (at big, public companies) that go up into the hundreds of
> Most people running cassandra start sharding clusters rather than going past a thousand
or so nodes - I know there's at least one person I talked to in IRC with a 1700 host cluster,
but that'd be beyond what I'd ever do personally.
>> On Tue, Feb 20, 2018 at 12:34 PM, Jürgen Albersdorfer <>
>> Thanks Jeff,
>> your answer is really not what I expected to learn - which is again more manual doing
as soon as we start really using C*. But I‘m happy to be able to learn it now and have still
time to learn the neccessary Skills and ask the right questions on how to correctly drive
big data with C* until we actually start using it, and I‘m glad to have People like you
around caring about this questions. Thanks. This still convinces me having bet on the right
horse, even when it might become a rough ride.
>> By the way, is it possible to migrate towards to smaller token ranges? What is the
recommended way doing so? And which number of nodes is the typical ‚break even‘?
>> Von meinem iPhone gesendet
>>> Am 20.02.2018 um 21:05 schrieb Jeff Jirsa <>:
>>> The scenario you describe is the typical point where people move away from vnodes
and towards single-token-per-node (or a much smaller number of vnodes).
>>> The default setting puts you in a situation where virtually all hosts are adjacent/neighbors
to all others (at least until you're way into the hundreds of hosts), which means you'll stream
from nearly all hosts. If you drop the number of vnodes from ~256 to ~4 or ~8 or ~16, you'll
see the number of streams drop as well.
>>> Many people with "large" clusters statically allocate tokens to make it predictable
- if you have a single token per host, you can add multiple hosts at a time, each streaming
from a small number of neighbors, without overlap.
>>> It takes a bit more tooling (or manual token calculation) outside of cassandra,
but works well in practice for "large" clusters.
>>>> On Tue, Feb 20, 2018 at 4:42 AM, Jürgen Albersdorfer <>
>>>> Hi, I'm wondering if it is possible resp. would it make sense to limit concurrent
streaming when joining a new node to cluster.
>>>> I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining another
Node every day.
>>>> The 'nodetool netstats' shows it always streams data from all other nodes.
>>>> How far will this scale? - What happens when I have hundrets or even thousends
of Nodes?
>>>> Has anyone experience with such a Situation?
>>>> Thanks, and regards
>>>> Jürgen

View raw message