cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: emptying my cluster
Date Thu, 05 Jan 2012 19:12:37 GMT
> * In the design discussed it is perfectly reasonable for data not to be on the archive
node. 
> 
> You mean when having the 2 DC setup I mentioned and using TTL? In case I have the 2 DC
setup but don't use TTL I don't understand why data wouldn't be on the archive node?
Originally you were talking about taking the archive node down, and then having HH write hints
back. HH is not considered a reliable mechanism for obtaining consistency, it's better in
1.0 but repair is AFAIK still considered the way to achieve consistency. For example HH only
collects hints for a down node for 1 hour.  Also a read operation will check consistency and
may repair it, snapshots do not do that. 

Finally if you write into the DC with 2 nodes at a CL other than QUORUM or EACH_QUORUM there
is no guarantee that the write will be committed in the other DC. 
 
>  So what data format should I use for historical archiving?
Plain text file, with documentation. So that any who follows you can work with the data.

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/01/2012, at 12:31 AM, Alexandru Sicoe wrote:

> Hi,
> 
> On Wed, Jan 4, 2012 at 9:54 PM, aaron morton <aaron@thelastpickle.com> wrote:
> Some thoughts on the plan:
> 
> * You are monkeying around with things, do not be surprised when surprising things happen.

> 
> I am just trying to explore different solutions for solving my problem.
>  
> * Deliberately unbalancing the cluster may lead to Bad Things happening. 
> 
> I will take your advice on this. I would have liked to have an extra node to have 2 nodes
in each DC.
>  
> * In the design discussed it is perfectly reasonable for data not to be on the archive
node. 
> 
> You mean when having the 2 DC setup I mentioned and using TTL? In case I have the 2 DC
setup but don't use TTL I don't understand why data wouldn't be on the archive node?
>  
> * Truncate is a cluster wide operation and all nodes must be online before it will start.

> * Truncate will snapshot before deleting data, you could use this snapshot. 
> * TTL for a column is for a column no matter which node it is on. 
> 
> Thanks for clarifying these!
>  
> * IMHO Cassandra data files (sstables or JSON dumps) are not a good format for a historical
archive, nothing against Cassandra. You need the lowest common format. 
> 
> So what data format should I use for historical archiving?
>  
> 
> If you have the resources for a second cluster could you put the two together and just
have one cluster with a very large retention policy? One cluster is easier than two.  
> 
> I am constrained to have limited retention on the Cassandra cluster that is collecting
the data . Once I archive the data for long term storage I cannot bring it back in the same
Cassandra cluster that collected it in the first place because it's in an enclosed network
with strict rules. I have to load it in another cluster outside the enclosed network. It's
not that I have the resources for a second cluster, I am forced to use a second cluster.
>  
> 
> Assuming there is no business case for this, consider either:
> 
> * Dumping the historical data into a Hadoop (with or without HDFS) cluster with high
compression. If needed you could then run Hive / Pig to fill a companion Cassandra cluster
with data on demand. Or just query using Hadoop.
> * Dumping the historical data to files with high compression and a roll your own solution
to fill a cluster. 
> 
> Ok, thanks for these suggestions, I will have to investigate further.
>  
> Also considering talking to Data Stax about DSE. 
> 
> Cheers 
>   
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 5/01/2012, at 1:41 AM, Alexandru Sicoe wrote:
> 
> 
> Cheers,
> Alex 
>> Hi,
>> 
>> On Tue, Jan 3, 2012 at 8:19 PM, aaron morton <aaron@thelastpickle.com> wrote:
>> Running a time based rolling window of data can be done using the TTL. Backing up
the nodes for disaster recover can be done using snapshots. Restoring any point in time will
be tricky because to may restore columns where the TTL has expired. 
>>  
>> Yeah, that's the thing...if I want to use the system as I explain further below,
I cannot do backing up of data (for later restoration) if I'm using TTLs. 
>>  
>> 
>>> Will I get a single copy of the data in the remote storage or will it be twice
the data (data + replica)?
>> You will  RF copies of the data. (By the way, there is no original copy)
>> 
>> Well, if I organize the cluster as I mentioned in the first email, I will get one
copy of each row at a certain point in time on node2 if I take it offline, perform a major
compaction and GC, won't I? I don't want to send duplicated data to the mass storage!
>>  
>> 
>> Can you share a bit more about the use case ? How much data and what sort of read
patterns ? 
>> 
>> 
>> I have several applications that feed into Cassandra about 2 million different variables
(each representing a different monitoring value/channel). The system receives updates for
each of these monitoring values at different rates. For each new update, the timestamp and
value are recorded in a Cassandra name-value pair. The schema of Cassandra is built using
one CF for data and 4 other CFs for metadata (metadata CFs are static - don't grow almost
at all once they've been loaded). The data CF uses a row for each variable. Each row acts
as a 4 hour time bin. I achieve this by creating the row key as a concatenation of  the first
6 digits of the timestamp at which the data is inserted + the unique ID of the variable. After
the time bin expires, a new row will be created for the same variable ID.
>> 
>> The system can currently sustain the insertion load. Now I'm looking into organizing
the flow of data out of the cluster and retrieval performance for random queries:
>> 
>> Why do I need to organize the data out? Well, my requirement is to keep all the data
coming into the system at the highest granularity for long term (several years). The 3 node
cluster I mentioned is the online cluster which is supposed to be able to absorb the input
load for a relatively short period of time, a few weeks (I am constrained to do this). After
this period the data has to be shipped out of the cluster in a mass storage facility and the
cluster needs to be emptied to make room for more data. Also, the online cluster will serve
reads while it takes in data. For older data I am planning to have another cluster that gets
loaded with data from the storage facility on demand and will serve reads from there.
>> 
>> Why random queries? There is no specific use case about them, that's why I want to
rely only on the built in Cassandra indexes for now. Generally the client will ask for sets
of values within a time range up to 8-10 hours in the past. Apart from some sets of variables
that will be almost always asked together, any combination is possible because this system
will feed in a web dashboard which will be used for debugging purposes  - to correlate and
aggregate streams of variables. Depending on the problem, different variable combinations
could be investigated. 
>>  
>> Can you split the data stream into a permanent log record and also into cassandra
for a rolling window of query able data ?   
>> 
>> In the end, essentially that's what I've been meaning to do with organizing the cluster
in a 2 DC setup: i wanted to have 2 nodes in DC1 taking the data and reads (the rolling window)
and replicating to the node in DC2 (the permanent log - of a single copy of the data). I was
thinking of implementing the rolling window by emptying the nodes in DC1 using truncate instead
of what you propose now with the rolling window using TTL. 
>> 
>> Ok, so I can do what you are saying easily if Cassandra allows me to have a TTL only
on the first copy of the data and have the second replica without a TTL. Is this possible?
I think it would solve my problem, as long as I can backup and empty the node in DC2 before
the TTLs expire in the other 2 nodes.
>> 
>> Cheers,
>> Alex
>> 
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 3/01/2012, at 11:41 PM, Alexandru Sicoe wrote:
>> 
>>> Hi,
>>> 
>>> I need to build a system that stores data for years, so yes, I am backing up
data in another mass storage system from where it could be later accessed. The data that I
successfully back up has to be deleted from my cluster to make space for new data coming in.
>>> 
>>> I was aware about the snapshotting which I will use for getting the data out
of node2: it creates hard links to the SSTables of a CF and then I can copy over those files
pointed to by the hard links into another location. After that I get rid of the snapshot (hard
links) and then I can truncate my CFs. It's clear that snapshotting will give me a single
copy of the data in case I have a unique copy of the data on one node. It's not clear to me
what happens if I have let's say a cluster with 3 nodes and RF=2 and I do a snapshot of every
node and copy those snapshots to remote storage. Will I get a single copy of the data in the
remote storage or will it be twice the data (data + replica)?
>>> 
>>> I've started reading about TTL and I think I can use it but it's not clear to
me how it would work in conjunction with the  snapshotting/backing up I need to do. I mean,
it will impose a deadline by which I need to perform a backup in order not to miss any data.
Also, I might duplicate the data if some columns don't expire fully between 2 backups. Any
clarifications on this?
>>> 
>>> Cheers,
>>> Alex
>>> 
>>> On Tue, Jan 3, 2012 at 9:44 AM, aaron morton <aaron@thelastpickle.com>
wrote:
>>> That sounds a little complicated. 
>>> 
>>> Do you want to get the data out for an off node backup or is it for processing
in another system ? 
>>> 
>>> You may get by using:
>>> 
>>> * TTL to expire data via compaction
>>> * snapshots for backups
>>> 
>>> Cheers
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 3/01/2012, at 11:00 AM, Alexandru Sicoe wrote:
>>> 
>>>> Hi everyone and Happy New Year!
>>>> 
>>>> I need advice for organizing data flow outside of my 3 node Cassandra 0.8.6
cluster. I am configuring my keyspace to use the NetworkTopologyStrategy. I have 2 data centers
each with a replication factor 1 (i.e. DC1:1; DC2:1) the configuration of the PropertyFileSnitch
is:
>>>>                               
>>>>                                                                    ip_node1=DC1:RAC1
>>>>                                                                         
                        ip_node2=DC2:RAC1
>>>>                                                                         
                        ip_node3=DC1:RAC1
>>>> I assign tokens like this:
>>>>                         node1 = 0
>>>>                         node2 = 1
>>>>                         node3 = 85070591730234615865843651857942052864
>>>> 
>>>> My write consistency level is ANY.
>>>> 
>>>> My data sources are only inserting data in node1 & node3. Essentially
what happens is that a replica of every input value will end up on node2. Node 2 thus has
a copy of the entire data written to the cluster. When Node2 starts getting full, I want to
have a script which pulls it off-line and does a sequence of operations (compaction/snapshotting/exporting/truncating
the CFs) in order to back up the data in a remote place and to free it up so that it can take
more data. When it comes back on-line it will take hints from the other 2 nodes.
>>>> 
>>>> This is how I plan on shipping data out of my cluster without any downtime
or any major performance penalty. The problem is when I want to also truncate the CFs in node1
& node3 to also free them up of data. I don't know whether I can do this without any downtime
or without any serious performance penalties. Is anyone using truncate to free up CFs of data?
How efficient is this?
>>>> 
>>>> Any observations or suggestions are much appreciated!
>>>> 
>>>> Cheers,
>>>> Alex
>>> 
>>> 
>> 
>> 
> 
> 


Mime
View raw message