incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Laube <>
Subject Read inconsistency after backup and restore to different cluster
Date Thu, 14 Nov 2013 20:37:01 GMT
Hi All,

After running through our backup and restore process FROM our test production TO our staging
environment, we are seeing inconsistent reads from the cluster we restored to. We have the
same number of nodes in both clusters. For example, we will select data from a column family
on the newly restored cluster but sometimes the expected data is returned and other times
it is not. These selects are carried out one after another with very little delay. It is almost
as if the data only exists on some of the nodes, or perhaps the token ranges are dramatically
different --again, we are using vnodes so I am not exactly sure how this plays into the equation.

We are running Cassadra 2.0.2 with vnodes and deploying via chef. The backup and restore process
is currently orchestrated using bash scripts and chef's distributed SSH. I have outlined the
process below for review. 

(I) Backup cluster-A (with existing prod data):
1. Run "nodetool flush" on each of the nodes in a 5 node ring.
2. Run "nodetool snapshot keyspace_name" on each of the nodes in a 5 node ring.
3. Archive the snapshot data from the snapshots directory in each node, creating a single
archive of the snapshot.
4. Copy the snapshot data archive for each of the nodes to s3.

(II) Restore backup FROM cluster-A  TO  cluster-B:
*NOTE: cluster-B is a freshly deployed ring with no data, but a different cluster-name used
for staging.

1. Deploy 5 nodes as part of the cluster-B ring. 
2. Create keyspace_name keyspace and column families on cluster-B.
3. Stop Cassandra on all 5 nodes in the cluster-B ring.
4. Clear commit logs on cluster-B with:  "rm -f /var/lib/cassandra/commitlog/*"
5. Copy 1 of the 5 snapshot archives from cluster-A to each of the five nodes in the new cluster-B
6. Extract the archives to /var/lib/cassandra/data/keyspace_name ensuring that the column
family directories and associated .DB files are in place under /var/lib/cassandra/data/keyspace_name/columfamily1/
7.Start Cassandra on each of the nodes in cluster-B.
8. Run "nodetool repair" on each of the nodes in cluster-B.

Please let me know if you see any major errors or deviation from best practices which could
be contributing to our read inconsistencies. I'll be happy to answer any specific question
you may have regarding our configuration. Thank you in advance!

Best regards,
-David Laube
View raw message