cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergi Vladykin <sergi.vlady...@gmail.com>
Subject Re: Data rebalancing algorithm
Date Thu, 24 Dec 2015 21:31:10 GMT
Thanks a lot for your answers!

Paulo, I'll take a look at classes you've suggested.

Jack, the link you've provided lacks description on how virtual nodes are
mapped to phisical sstables/indexes on disk.

To be more exact, I have the following better detailed questions:

1. How vnodes are mapped to sstables and indexes? Is one vnode a separate
part of the sstable or all the data from all vnodes just mixed in SSTable
or may be something else?

2. As far as I see Cassandra does not have predefined constant total number
of vnodes for the whole cluster, right? Does it mean that on rebalancing
some parts of data already mapped to some vnodes will be remapped to new
vnodes on the new node?

3. How long can take the rebalancing if we have lets say 1TB of data on a
single node and we are adding one more node to the cluster?

Sergi


2015-12-24 19:26 GMT+03:00 Jack Krupansky <jack.krupansky@gmail.com>:

> Read details here:
>
> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html
>
>
> -- Jack Krupansky
>
> On Thu, Dec 24, 2015 at 11:09 AM, Paulo Motta <pauloricardomg@gmail.com>
> wrote:
>
>> The new node will own some parts (ranges) of the ring according to the
>> ring tokens the node is responsible for. These tokens are defined from the
>> yaml property initial_token (manual assignment) or num_tokens (random
>> assignment).
>>
>> During the bootstrap process raw data from sstables sections containing
>> the ranges the node is responsible for are transferred from nodes that
>> previously owned the range to the new node so the source sstables are
>> rebuilt in the joining node. After each sstable is transferred the new node
>> it rebuilds primary and secondary indexes, bloom filters, etc and in the
>> end of the bootstrap process the new sstables are added to the live data
>> set.
>>
>> See org.apache.cassandra.dht.BootStrapper.java and
>> org.apache.cassandra.streaming.StreamReceiveTask of the trunk branch for
>> more information.
>>
>> ps: I don't particularly recall any document with specific details, so if
>> anyone knows please be welcome to share. If you want more theoretical
>> information, see the ring membership sections of the cassandra and/or
>> dynamo paper.
>>
>>
>> 2015-12-24 13:14 GMT-02:00 Sergi Vladykin <sergi.vladykin@gmail.com>:
>>
>>> Guys,
>>>
>>> I was not able to find in docs or in google detailed description of data
>>> rebalancing algorithm.
>>>
>>> I mean how Cassandra moves SSTables when new node connects to the
>>> cluster, how
>>> primary and secondary indexes are getting transfered to this new node,
>>> etc..
>>>
>>> Can anyone provide relevant links please or just reply here?
>>>
>>> I can read source code of course, but it would be nice if someone could
>>> answer right away :)
>>>
>>> Sergi
>>>
>>
>>
>

Mime
View raw message