cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Coli <rc...@eventbrite.com>
Subject Re: copy and rename sstable files as keyspace migration approach
Date Tue, 23 Feb 2016 18:36:45 GMT
On Tue, Feb 23, 2016 at 6:44 AM, Jarod Guertin <jarod.guertin@sparkpost.com>
wrote:

> Being fairly new to Cassandra, I'd like to run the following with the
> experts to make sure it's an ok thing to do.
>
> We have a particular case where we have multiple keyspaces with multiple
> tables each and we want to migrate to a new unique keyspace on the same
> cluster.
>
> The approach envisioned is:
> 1. take snapshots on all the nodes
> 2. create the new keyspace and all the tables with identical schema
> settings (just a different name and keyspace location)
> 3. one node at a time, stop cassandra, copy the db files from the old
> keyspace\table locations to the new keyspace\table locations and rename the
> db filename to use the new keyspace name; then restart cassandra
> 4. verify cassandra is running, then repeat step 3 for each other node
> 5. once all done switch our application calls to use the new keyspace \
> tables
> 6. run node repair on each node, one node at a time
>
> It is understood that between the snapshots (1) and using the new keyspace
> (5) that any changes would not be included in the migration, it would be
> done during a maintenance window when only read operations would be
> permitted.  I should also mention that our number of cassandra nodes is
> greater than the replication factor (3).
>

This is essentially the same operation as renaming a columnfamily, which I
described (and someone provided some useful details regarding) in this Jira
:

https://issues.apache.org/jira/browse/CASSANDRA-1585?focusedCommentId=13488959&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13488959

It's similar to the "copy-the-sstables" method here as well :

https://www.pythian.com/blog/bulk-loading-options-for-cassandra/

Notes on your variant :

- 1) why snapshot? just for safety?
- 3) add nodetool drain before stopping
- 3) if you're "copying" you should strongly consider hard linking instead.
that way you keep the (immutable) files in both places but only use the
disk space once. [1]
- 6) is un-necessary if you've done things properly, which you could verify
by having a representative known set of data that you read before and after
- Presumably there is a silent 7) where you drop the old keyspaces/CFs?

=Rob

[1] In some very new versions of Cassandra, this may not be safe to do with
certain meta information files which are sadly no longer immutable.

Mime
View raw message