incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Zhu <wz1...@yahoo.com>
Subject General question regarding bootstrap and nodetool repair
Date Thu, 31 Jan 2013 18:50:40 GMT
Hi,
After messing around with my Cassandra cluster recently, I think I need some basic understanding
on how things work behind scene regarding data streaming.
Let's say we have three node cluster with RF = 3.  If node 3 for some reason dies and I want
to replace it with a new node with the same (maybe minus one) range. During the bootstrap,
how the data is streamed?
From what I observed, Node 3 has replicates for its primary range on node 4, 5. So it streams
the data from them and starts to compact them. Also, node 3 holds replicates for primary range
of node 2, so it streams data from node 2 and node 4. Similarly, it holds replicates for node
1. So data streamed from node 1 and node 2. So during the bootstaping, it basically gets the
data from all the replicates (2 copies each), so it will require double the disk space in
order to hold the data? Over the time, those SStables will be compacted and redundant will
be removed? Is it true?

if we issue nodetool repair -pr on node 3, apart from streaming data from node 4, 5 to 3.
We also see data stream between node 4, 5 since they hold the replicates. But I don't see
log regarding "merkle tree calculation" on node 4,5. Just wondering how they know what data
to stream in order to repair node 4, 5?

Thanks.
-Wei

Mime
View raw message