cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <jji...@gmail.com>
Subject Re: Rapid scaleup of cassandra nodes with snapshots and initial_token in the yaml
Date Thu, 15 Feb 2018 17:24:21 GMT
Short answer is "yes, with caveats". I recall a talk about this from
Cassandra Summit ~2014 or so. I THINK it was from Instaclustr, but I'm not
positive. Maybe Ben or Kurt or someone over there has more info (if it
really was them)?





On Wed, Feb 14, 2018 at 9:40 AM, Carl Mueller <carl.mueller@smartthings.com>
wrote:

> https://stackoverflow.com/questions/48776589/cassandra-
> cant-one-use-snapshots-to-rapidly-scale-out-a-cluster/48778179#48778179
>
> So the basic question is, if one records tokens and snapshots from an
> existing node, via:
>
> nodetool ring | grep ip_address_of_node | awk '{print $NF ","}' | xargs
>
>
> for the desired node IP
>
> then takes snapshots
>
> then transfers the snapshots to a new node (not yet attached to cluster)
>
> sets up initial_tokens in the yaml
>
> sets up schema to match
>
> then has it join the cluster
>
> Would that allow quick scaleup of nodes/replication of data? I don't care
> if the vnode map changes after the initial join, or data starts being
> streamed off as it rebalances, as the cluster
>
> Is there an issue if the vnodes tokens for two nodes are identical? Do
> they have to be distinct for each node?
> Is it that it mucks with the RF since there will be a greater RF than
> normal?
> Is this just not that practically faster than an sstable load?
>
> Basically, I was wondering if we just use this to double the number of
> nodes with identical copies of the node data via snapshots, and then later
> on cassandra can pare down which nodes own which data.
>
>
>

Mime
View raw message