incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: copy data from multi-node cluster to single node
Date Tue, 05 Jul 2011 09:15:54 GMT
> Is it possible the snapshots from different nodes have the same name?
The directory name will be made up of the current timestamp on the machine and the optional
name passed via the command line. 

The SSTables from different nodes may have name collisions. If you are aggregating data from
multiple nodes onto one you will need to manually update them. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 5 Jul 2011, at 14:59, Zhu Han wrote:

> On Tue, Jul 5, 2011 at 8:58 AM, aaron morton <aaron@thelastpickle.com> wrote:
>> How do you change the name of a cluster?  The FAQ instructions do not seem to work
for me - are they still valid for 0.7.5?
>> Is the backup / restore mechanism going to work, or is there a better/simpler to
copy data from multi-node to single-node?
> 
> Bug fixed on 0.7.6 https://github.com/apache/cassandra/blob/cassandra-0.7.6-2/CHANGES.txt#L21
> 
> Also you should move to 0.7.6 to get the Gossip fix https://github.com/apache/cassandra/blob/cassandra-0.7.6-2/CHANGES.txt#L6
> 
> When it comes to moving the data back to a single node I would:
> - run repair
> - snapshot prod node
> - clear all data including the system KS data from the dev node
> - copy the snapshot data for only your KS to the dev node into the correct directory,
e.g. data/<my-keyspace> . 
> - start the dev node
> - add your KS, the node will now load the data
> 
> Ignoring the system data means the dev node can sort it's cluster name and token out
using the yaml file. 
> 
> Even with 3 nodes and RF 3 it's impossible to ever say that one node has a complete copy
of the data. Running repair will make it more likely, but the node could drop a mutation message
during the repair or drop off gossip for few seconds. If you really want to have *everything*
from the prod cluster then copy the data from all 3 nodes onto the dev node and compact it
down. 
> 
> Is it possible the snapshots from different nodes have the same name?
>  
> 
> Hope that helps. 
>   
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 5 Jul 2011, at 03:05, Ross Black wrote:
> 
>> Hi,
>> 
>> I am using Cassandra 0.7.5 on Linux machines.
>> 
>> I am trying to backup data from a multi-node cluster (3 nodes) and restore it into
a single node cluster that has a different name (for development testing).
>> 
>> The multi-node cluster is backed up using clustertool global_snapshot, and then I
copy the snapshot from a single node and replace the data directory in the single node.
>> The multi-node cluster has a replication factor of 3, so I assume that restoring
any node from the multi-node cluster will be the same.
>> When started up this fails with a node name mismatch.
>> 
>> I have tried removing all the Location* files in the data directory (as per http://wiki.apache.org/cassandra/FAQ#clustername_mismatch)
but the single node then fails with an error message:
>> org.apache.cassandra.config.ConfigurationException: Found system table files, but
they couldn't be loaded. Did you change the partitioner?
>> 
>> 
>> How do you change the name of a cluster?  The FAQ instructions do not seem to work
for me - are they still valid for 0.7.5?
>> Is the backup / restore mechanism going to work, or is there a better/simpler to
copy data from multi-node to single-node?
>> 
>> Thanks,
>> Ross
>> 
> 
> 


Mime
View raw message