cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Bootstrapping
Date Thu, 11 Aug 2011 01:59:38 GMT
First, upgrade from 0.7.5 if possible. This is as good a reason as any https://github.com/apache/cassandra/blob/cassandra-0.7.8/CHANGES.txt#L58

Can you copy the SSTables off node and then just bring it back ? It will be *a lot* faster
than use nodetool repair. (drain the node first to clear the commit log). Or if you have a
spare machine perform a rolling migration.

If at all possible I would try to do it as an upgrade described above. It will be much much
easier. 

If you plan to turn a node off and clear it's data you should remove the nodes token from
the ring. You can either use nodetool decommission which will distribute the data around the
ring, or turn it off and then use nodetool remove token which will not. 

> 1. I realize this will put a heavier I/O load on the replication nodes to AntiCompact
the CF's, but what kind of load does this put on the JVM. Are there any gotchas I should be
aware of to prevent long gc times or OOM exceptions on the replication nodes.
We don't have the AnitCompaction step any more. If your app is stable I would assume the repair
process would be to. Do your normal repair processed complete ok ?

> 3. Documentation at http://wiki.apache.org/cassandra/Operations says that the thrift
port is not active on the bootstrapping node during the streaming process. What is the process
that brings the node up-to-date with mutations that occurred during the time of the bootstrap?
Maybe it's only reads that are disabled and writes are allowed?

Thrift is the connection the client uses, disabling it means clients cannot write to it. The
node will announce it's intention to take ownership of a token range in the ring when the
bootstrap starts. From that point on other nodes will include it in write requests but not
read requests. During that time your data is replicated to RF+1 nodes. 
 
> 4. What happens if schema changes (add/drop column families) occur in the cluster while
the bootstrap is in progress?
They will be distributed to the node when it comes back. Until it gets the new updates it
will log ERRORs for mutations to non existent CF's. Best advice is do not make those changes.


Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 11 Aug 2011, at 09:54, Chad Johnson wrote:

> Hi,
> 
> I have a 15 node cluster with a RF=3 running version 0.7.5. I am planning to perform
some filesystem maintenance on each of the nodes. The filesystem happens to be on the partition
holding the keyspace data. The maintenance means that all the SSTables for our keyspace will
be destroyed. Rather than backup all the data to a backup disk and restore, my plan was to
bring the node down, perform the maintenance, keep the original initial_token, set auto_bootstrap
to true and let Cassandra repopulate the data through the streaming process. Nodes in the
cluster will have a load of about 250 to 300GB
> 
> I have a couple questions regarding bootstrapping and the streaming process.
> 
> 1. I realize this will put a heavier I/O load on the replication nodes to AntiCompact
the CF's, but what kind of load does this put on the JVM. Are there any gotchas I should be
aware of to prevent long gc times or OOM exceptions on the replication nodes.
> 2. If the initial_token is not changed, is it correct to assume that anticompaction will
occur only on the replication nodes and not throughout the cluster as the key space has not
been modified.
> 3. Documentation at http://wiki.apache.org/cassandra/Operations says that the thrift
port is not active on the bootstrapping node during the streaming process. What is the process
that brings the node up-to-date with mutations that occurred during the time of the bootstrap?
Maybe it's only reads that are disabled and writes are allowed?
> 4. What happens if schema changes (add/drop column families) occur in the cluster while
the bootstrap is in progress?
> 
> Thanks for your help
> 
> Chad


Mime
View raw message