cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Motta <pauloricard...@gmail.com>
Subject Re: Data rebalancing algorithm
Date Thu, 24 Dec 2015 16:09:21 GMT
The new node will own some parts (ranges) of the ring according to the ring
tokens the node is responsible for. These tokens are defined from the yaml
property initial_token (manual assignment) or num_tokens (random
assignment).

During the bootstrap process raw data from sstables sections containing the
ranges the node is responsible for are transferred from nodes that
previously owned the range to the new node so the source sstables are
rebuilt in the joining node. After each sstable is transferred the new node
it rebuilds primary and secondary indexes, bloom filters, etc and in the
end of the bootstrap process the new sstables are added to the live data
set.

See org.apache.cassandra.dht.BootStrapper.java and
org.apache.cassandra.streaming.StreamReceiveTask of the trunk branch for
more information.

ps: I don't particularly recall any document with specific details, so if
anyone knows please be welcome to share. If you want more theoretical
information, see the ring membership sections of the cassandra and/or
dynamo paper.

2015-12-24 13:14 GMT-02:00 Sergi Vladykin <sergi.vladykin@gmail.com>:

> Guys,
>
> I was not able to find in docs or in google detailed description of data
> rebalancing algorithm.
>
> I mean how Cassandra moves SSTables when new node connects to the cluster,
> how
> primary and secondary indexes are getting transfered to this new node,
> etc..
>
> Can anyone provide relevant links please or just reply here?
>
> I can read source code of course, but it would be nice if someone could
> answer right away :)
>
> Sergi
>

Mime
View raw message