cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Antti Nissinen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-10306) Splitting SSTables in time, deleting and archiving SSTables
Date Fri, 11 Sep 2015 12:04:45 GMT
Antti Nissinen created CASSANDRA-10306:
------------------------------------------

             Summary: Splitting SSTables in time, deleting and archiving SSTables
                 Key: CASSANDRA-10306
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10306
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Antti Nissinen
             Fix For: 2.1.x, 2.2.x


This document is a continuation for [CASSANDRA-10195|https://issues.apache.org/jira/browse/CASSANDRA-10195]
and describes some needs to be able split files in time wise as discussed also in [CASSANDRA-8361|https://issues.apache.org/jira/browse/CASSANDRA-8361].
Data model is explained shortly, then the practical issues running Cassandra with time series
data and needs for the splitting capabilities.

Data model: (snippet from [CASSANDRA-9644|https://issues.apache.org/jira/browse/CASSANDRA-9644)]
Data is time series data. Data is saved so that one row contains a certain time span of data
for a given metric ( 20 days in this case). The row key contains information about the start
time of the time span and metrix name. Column name gives the offset from the beginning of
time span. Column time stamp is set to correspond time stamp when adding together the timestamp
from the row key and the offset (the actual time stamp of data point). Data model is analog
to KairosDB implementation.

In the practical application the data is added to real-time into the column family. While
converting from legacy system old data is pre-loaded in timely order by faking the timestamp
of the column before starting the real-time data collection. However, there is intermittently
a need to insert also older data to the database due to the fact that is has not been available
in real-time or additional time series are fed in afterward due to unforeseeable needs. 

Adding old  data simultaneously with real-time data will lead to SSTables that are containing
data from a time period exceeding the length of the compaction window (TWCS and DTCS). Therefore
SSTables are not behaving in predictable manner in compaction process.

Tombstones are masking the data from queries but the release of disk space requires that SStables
containing tombstones would be compacted together with SSTables having the original data.
While using TWCS or DTCS and writing tombstones with timestamp corresponding the real time
SStables containing the original data will not end up to be compacted with SSTables having
the tombstone. Even if writing tombstones by faking the timestamps the SSTable should be written
apart from the on-going real-time data. Otherwise the SSTables have to be splitted (see later).


TTL is a working method to delete data from column family and releasing disk space in a predictable
manner. However, setting the correct TTL is not a trivial task. Required TTL might change
e.g. due to legislation or the customer would like to have a longer lifetime for the data.


The other factor affecting the disk space consumption is the variability of the rate how much
data is fed to the column family. In certain troubleshooting cases the sample rate can be
increased ten fold for a large portion of collected time series. This will lead to rapid consumption
of disk space and old data has to be deleted / archived in a such manner that disk space will
be released in a quick and predictable manner.

Losing one or more nodes from the cluster and not having a spare hardware will also lead to
a situation that data from the lost node has to be replicated again for the remaining nodes.
This will lead to increased disk space consumption per node and probably requires some cleaning
of older data away from the active column family.

All of the above issues could be of course handled just by adding more disk space or nodes
to the cluster. In the cloud environment that would a feasible option. In the application
sitting in real hardware in isolated environment this is not a feasible solution due to practical
reasons or due to costs. Getting new hardware on sites might take a long time e.g. due to
custom regulations.

In the application domain (time series data collection) the data is not modified after inserting
to the column family. There will be only read operations and deletion / archiving of old data
based on the TTL or operator actions.

The above reasoning will lead to following conclusions and proposals.

* TWCS and DTCS (with certain modifications) are leading to a well structured SSTables where
tables are organized in timely manner giving opportunities to manage available disk capacity
on nodes. Recovering from repairs works also (compaction the flood of small SSTables with
larger ones).
* Being able to effectively split the SStables along a given time line would lead to SSTable
sets on all nodes that would allow deletion or archiving SSTables. What would be the mechanism
to inactivate SSTables during deletion / archiving so that nodes don’t start streaming “missing”
data between nodes (repairs)?
* Being able to split existing SSTables along multiple timelines determined by TWCS would
allow insertion of older data to the column family that would eventually be compacted in desired
manner in correct time window. Original SSTable would be streamed to several SStables according
to time windows. In the end empty SSTables would be discarded.
* Splitting action would be a tool to be executed through the nodetool command when needed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message