cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chad Johnson <chad.johnso...@gmail.com>
Subject Bootstrapping
Date Wed, 10 Aug 2011 21:54:16 GMT
Hi,

I have a 15 node cluster with a RF=3 running version 0.7.5. I am planning to perform some
filesystem maintenance on each of the nodes. The filesystem happens to be on the partition
holding the keyspace data. The maintenance means that all the SSTables for our keyspace will
be destroyed. Rather than backup all the data to a backup disk and restore, my plan was to
bring the node down, perform the maintenance, keep the original initial_token, set auto_bootstrap
to true and let Cassandra repopulate the data through the streaming process. Nodes in the
cluster will have a load of about 250 to 300GB

I have a couple questions regarding bootstrapping and the streaming process.

1. I realize this will put a heavier I/O load on the replication nodes to AntiCompact the
CF's, but what kind of load does this put on the JVM. Are there any gotchas I should be aware
of to prevent long gc times or OOM exceptions on the replication nodes.
2. If the initial_token is not changed, is it correct to assume that anticompaction will occur
only on the replication nodes and not throughout the cluster as the key space has not been
modified.
3. Documentation at http://wiki.apache.org/cassandra/Operations says that the thrift port
is not active on the bootstrapping node during the streaming process. What is the process
that brings the node up-to-date with mutations that occurred during the time of the bootstrap?
Maybe it's only reads that are disabled and writes are allowed?
4. What happens if schema changes (add/drop column families) occur in the cluster while the
bootstrap is in progress?

Thanks for your help

Chad
Mime
View raw message