cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ZeroUno <zerozerouno...@gmail.com>
Subject sstableloader usage doubts
Date Thu, 04 Jun 2015 12:39:08 GMT
Hi,
while defining backup and restore procedures for a Cassandra cluster I'm 
trying to use sstableloader for restoring a snapshot from a backup, but 
I'm not sure I fully understand the documentation on how it should be used.

Looking at the examples in the doc at 
http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsBulkloader_t.html 
it seems like the path_to_keyspace to be passed as an argument is 
exactly the cassandra data directory. So, you already move the data in 
the final target location and then again stream it to the cluster?

Let's do a step back. My cluster is composed of two data centers. Each 
data center has two nodes (nodeA1, nodeA2 for center A, nodeB1, nodeB2 
for center B).
I'm using NetworkTopologyStrategy with RF=2.

For doing periodic backups I'm creating a snapshot on two nodes 
simultaneously in a single data center (nodeA1 and nodeA2), and then 
moving the snapshot files in a safe place.
To simulate a disaster recovery situation, I truncate all tables to 
erase data (but not the schema which would be re-created anyway by my 
application), I stop cassandra on all 4 nodes, I move the snapshot 
backup files in their original locations (e.g. 
/mydatapath/cassandra/data/mykeyspace/mytable1/) on nodeA1 and nodeA2, 
then I restart cassandra on all 4 nodes.

At last, I run:

> sstableloader -d nodeA1,nodeA2,nodeB1,nodeB2 /mydatapath/cassandra/data/mykeyspace/mytable1/
> sstableloader -d nodeA1,nodeA2,nodeB1,nodeB2 /mydatapath/cassandra/data/mykeyspace/mytable2/
> sstableloader -d nodeA1,nodeA2,nodeB1,nodeB2 /mydatapath/cassandra/data/mykeyspace/mytable3/
> [...and so on for all tables]

...on both nodeA1 and nodeA2, where I restored the snapshot.

Is that correct?

I observed some strange behaviour after doing this: when I truncated 
tables again, a select count(*) on one of the A nodes still returned a 
non-zero number, as if data was still there.
I started thinking that maybe the source sstable directory for 
sstableloader should not be the data directory itself, as this causes 
some kind if "double data" problem...

Can anyone please tell me if this is the correct way to proceed?
Thank you very much!

-- 
01


Mime
View raw message