That sounds a little complicated.
Do you want to get the data out for an off node backup or is it for processing in another system ?
You may get by using:
* TTL to expire data via compaction
* snapshots for backups
On 3/01/2012, at 11:00 AM, Alexandru Sicoe wrote:
Hi everyone and Happy New Year!
I need advice for organizing data
flow outside of my 3 node Cassandra 0.8.6 cluster. I am configuring my
keyspace to use the NetworkTopologyStrategy. I have 2 data centers each
with a replication factor 1 (i.e. DC1:1; DC2:1) the configuration of the
I assign tokens like this:
node1 = 0
node2 = 1
node3 = 85070591730234615865843651857942052864
My write consistency level is ANY.
data sources are only inserting data in node1 & node3. Essentially
what happens is that a replica of every input value will end up on
node2. Node 2 thus has a copy of the entire data written to the cluster.
When Node2 starts getting full, I want to have a script which pulls it
off-line and does a sequence of operations (compaction/snapshotting/exporting/truncating
the CFs) in order to back up the data in a remote place and to free it
up so that it can take more data. When it comes back on-line it will
take hints from the other 2 nodes.
This is how I plan on shipping data out of my cluster without any
downtime or any major performance penalty. The problem is when I want to
also truncate the CFs in node1 & node3 to also free them up of
data. I don't know whether I can do this without any downtime or without
any serious performance penalties. Is anyone using truncate to free up
CFs of data? How efficient is this?
Any observations or suggestions are much appreciated!