cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: emptying my cluster
Date Tue, 03 Jan 2012 08:44:12 GMT
That sounds a little complicated. 

Do you want to get the data out for an off node backup or is it for processing in another
system ? 

You may get by using:

* TTL to expire data via compaction
* snapshots for backups

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 3/01/2012, at 11:00 AM, Alexandru Sicoe wrote:

> Hi everyone and Happy New Year!
> 
> I need advice for organizing data flow outside of my 3 node Cassandra 0.8.6 cluster.
I am configuring my keyspace to use the NetworkTopologyStrategy. I have 2 data centers each
with a replication factor 1 (i.e. DC1:1; DC2:1) the configuration of the PropertyFileSnitch
is:
>                               
>                                                                    ip_node1=DC1:RAC1
>                                                                                     
            ip_node2=DC2:RAC1
>                                                                                     
            ip_node3=DC1:RAC1
> I assign tokens like this:
>                         node1 = 0
>                         node2 = 1
>                         node3 = 85070591730234615865843651857942052864
> 
> My write consistency level is ANY.
> 
> My data sources are only inserting data in node1 & node3. Essentially what happens
is that a replica of every input value will end up on node2. Node 2 thus has a copy of the
entire data written to the cluster. When Node2 starts getting full, I want to have a script
which pulls it off-line and does a sequence of operations (compaction/snapshotting/exporting/truncating
the CFs) in order to back up the data in a remote place and to free it up so that it can take
more data. When it comes back on-line it will take hints from the other 2 nodes.
> 
> This is how I plan on shipping data out of my cluster without any downtime or any major
performance penalty. The problem is when I want to also truncate the CFs in node1 & node3
to also free them up of data. I don't know whether I can do this without any downtime or without
any serious performance penalties. Is anyone using truncate to free up CFs of data? How efficient
is this?
> 
> Any observations or suggestions are much appreciated!
> 
> Cheers,
> Alex


Mime
View raw message