cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandru Sicoe <>
Subject Re: emptying my cluster
Date Thu, 05 Jan 2012 11:31:23 GMT

On Wed, Jan 4, 2012 at 9:54 PM, aaron morton <>wrote:

> Some thoughts on the plan:
> * You are monkeying around with things, do not be surprised when
> surprising things happen.

I am just trying to explore different solutions for solving my problem.

> * Deliberately unbalancing the cluster may lead to Bad Things happening.

I will take your advice on this. I would have liked to have an extra node
to have 2 nodes in each DC.

> * In the design discussed it is perfectly reasonable for data not to be on
> the archive node.

You mean when having the 2 DC setup I mentioned and using TTL? In case I
have the 2 DC setup but don't use TTL I don't understand why data wouldn't
be on the archive node?

> * Truncate is a cluster wide operation and all nodes must be online before
> it will start.
* Truncate will snapshot before deleting data, you could use this snapshot.
> * TTL for a column is for a column no matter which node it is on.

Thanks for clarifying these!

> * IMHO Cassandra data files (sstables or JSON dumps) are not a good format
> for a historical archive, nothing against Cassandra. You need the lowest
> common format.

So what data format should I use for historical archiving?

> If you have the resources for a second cluster could you put the two
> together and just have one cluster with a very large retention policy? One
> cluster is easier than two.

I am constrained to have limited retention on the Cassandra cluster that is
collecting the data . Once I archive the data for long term storage I
cannot bring it back in the same Cassandra cluster that collected it in the
first place because it's in an enclosed network with strict rules. I have
to load it in another cluster outside the enclosed network. It's not that I
have the resources for a second cluster, I am forced to use a second

> Assuming there is no business case for this, consider either:
> * Dumping the historical data into a Hadoop (with or without HDFS) cluster
> with high compression. If needed you could then run Hive / Pig to fill a
> companion Cassandra cluster with data on demand. Or just query using Hadoop.
> * Dumping the historical data to files with high compression and a roll
> your own solution to fill a cluster.
> Ok, thanks for these suggestions, I will have to investigate further.

> Also considering talking to Data Stax about DSE.
> Cheers
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> On 5/01/2012, at 1:41 AM, Alexandru Sicoe wrote:

> Hi,
> On Tue, Jan 3, 2012 at 8:19 PM, aaron morton <>wrote:
>>   Running a time based rolling window of data can be done using the TTL.
>> Backing up the nodes for disaster recover can be done using snapshots.
>> Restoring any point in time will be tricky because to may restore columns
>> where the TTL has expired.
> Yeah, that's the thing...if I want to use the system as I explain further
> below, I cannot do backing up of data (for later restoration) if I'm using
> TTLs.
>> Will I get a single copy of the data in the remote storage or will it be
>> twice the data (data + replica)?
>> You will  RF copies of the data. (By the way, there is no original copy)
> Well, if I organize the cluster as I mentioned in the first email, I will
> get one copy of each row at a certain point in time on node2 if I take it
> offline, perform a major compaction and GC, won't I? I don't want to send
> duplicated data to the mass storage!
>> Can you share a bit more about the use case ? How much data and what sort
>> of read patterns ?
> I have several applications that feed into Cassandra about 2 million
> different variables (each representing a different monitoring
> value/channel). The system receives updates for each of these monitoring
> values at different rates. For each new update, the timestamp and value are
> recorded in a Cassandra name-value pair. The schema of Cassandra is built
> using one CF for data and 4 other CFs for metadata (metadata CFs are static
> - don't grow almost at all once they've been loaded). The data CF uses a
> row for each variable. Each row acts as a 4 hour time bin. I achieve this
> by creating the row key as a concatenation of  the first 6 digits of the
> timestamp at which the data is inserted + the unique ID of the variable.
> After the time bin expires, a new row will be created for the same variable
> ID.
> The system can currently sustain the insertion load. Now I'm looking into organizing
> the flow of data out of the cluster and retrieval performance for random
> queries:
> Why do I need to organize the data out? Well, my requirement is to keep
> all the data coming into the system at the highest granularity for long
> term (several years). The 3 node cluster I mentioned is the online cluster
> which is supposed to be able to absorb the input load for a relatively
> short period of time, a few weeks (I am constrained to do this). After this
> period the data has to be shipped out of the cluster in a mass storage
> facility and the cluster needs to be emptied to make room for more data.
> Also, the online cluster will serve reads while it takes in data. For older
> data I am planning to have another cluster that gets loaded with data from
> the storage facility on demand and will serve reads from there.
> Why random queries? There is no specific use case about them, that's why I
> want to rely only on the built in Cassandra indexes for now. Generally the
> client will ask for sets of values within a time range up to 8-10 hours in
> the past. Apart from some sets of variables that will be almost always
> asked together, any combination is possible because this system will feed
> in a web dashboard which will be used for debugging purposes  - to
> correlate and aggregate streams of variables. Depending on the problem,
> different variable combinations could be investigated.
>>   Can you split the data stream into a permanent log record and also
>> into cassandra for a rolling window of query able data ?
> In the end, essentially that's what I've been meaning to do with
> organizing the cluster in a 2 DC setup: i wanted to have 2 nodes in DC1
> taking the data and reads (the rolling window) and replicating to the node
> in DC2 (the permanent log - of a single copy of the data). I was thinking
> of implementing the rolling window by emptying the nodes in DC1 using
> truncate instead of what you propose now with the rolling window using TTL.
> Ok, so I can do what you are saying easily if Cassandra allows me to have
> a TTL only on the first copy of the data and have the second replica
> without a TTL. Is this possible? I think it would solve my problem, as long
> as I can backup and empty the node in DC2 before the TTLs expire in the
> other 2 nodes.
> Cheers,
> Alex
>> Cheers
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> On 3/01/2012, at 11:41 PM, Alexandru Sicoe wrote:
>> Hi,
>> I need to build a system that stores data for years, so yes, I am backing
>> up data in another mass storage system from where it could be later
>> accessed. The data that I successfully back up has to be deleted from my
>> cluster to make space for new data coming in.
>> I was aware about the snapshotting which I will use for getting the data
>> out of node2: it creates hard links to the SSTables of a CF and then I can
>> copy over those files pointed to by the hard links into another location.
>> After that I get rid of the snapshot (hard links) and then I can truncate
>> my CFs. It's clear that snapshotting will give me a single copy of the data
>> in case I have a unique copy of the data on one node. It's not clear to me
>> what happens if I have let's say a cluster with 3 nodes and RF=2 and I do a
>> snapshot of every node and copy those snapshots to remote storage. Will I
>> get a single copy of the data in the remote storage or will it be twice the
>> data (data + replica)?
>> I've started reading about TTL and I think I can use it but it's not
>> clear to me how it would work in conjunction with the snapshotting/backing
>> up I need to do. I mean, it will impose a deadline by which I need to
>> perform a backup in order not to miss any data. Also, I might duplicate the
>> data if some columns don't expire fully between 2 backups. Any
>> clarifications on this?
>> Cheers,
>> Alex
>> On Tue, Jan 3, 2012 at 9:44 AM, aaron morton <>wrote:
>>> That sounds a little complicated.
>>> Do you want to get the data out for an off node backup or is it for
>>> processing in another system ?
>>> You may get by using:
>>> * TTL to expire data via compaction
>>> * snapshots for backups
>>> Cheers
>>>   -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> On 3/01/2012, at 11:00 AM, Alexandru Sicoe wrote:
>>> Hi everyone and Happy New Year!
>>> I need advice for organizing data flow outside of my 3 node Cassandra
>>> 0.8.6 cluster. I am configuring my keyspace to use the
>>> NetworkTopologyStrategy. I have 2 data centers each with a replication
>>> factor 1 (i.e. DC1:1; DC2:1) the configuration of the PropertyFileSnitch is:
>>> ip_node1=DC1:RAC1
>>> ip_node2=DC2:RAC1
>>> ip_node3=DC1:RAC1
>>> I assign tokens like this:
>>>                         node1 = 0
>>>                         node2 = 1
>>>                         node3 = 85070591730234615865843651857942052864
>>> My write consistency level is ANY.
>>> My data sources are only inserting data in node1 & node3. Essentially
>>> what happens is that a replica of every input value will end up on node2.
>>> Node 2 thus has a copy of the entire data written to the cluster. When
>>> Node2 starts getting full, I want to have a script which pulls it off-line
>>> and does a sequence of operations
>>> (compaction/snapshotting/exporting/truncating the CFs) in order to back up
>>> the data in a remote place and to free it up so that it can take more data.
>>> When it comes back on-line it will take hints from the other 2 nodes.
>>> This is how I plan on shipping data out of my cluster without any
>>> downtime or any major performance penalty. The problem is when I want to
>>> also truncate the CFs in node1 & node3 to also free them up of data. I
>>> don't know whether I can do this without any downtime or without any
>>> serious performance penalties. Is anyone using truncate to free up CFs of
>>> data? How efficient is this?
>>> Any observations or suggestions are much appreciated!
>>> Cheers,
>>> Alex

View raw message