Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Fri, 11 Sep 2015 12:04:45 +0000 (UTC)
From: "Antti Nissinen (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12863363.1441973028000.313276.1441973085683@Atlassian.JIRA>
In-Reply-To: <JIRA.12863363.1441973028000@Atlassian.JIRA>
References: <JIRA.12863363.1441973028000@Atlassian.JIRA>
 <JIRA.12863363.1441973028037@arcas>
Subject: [jira] [Created] (CASSANDRA-10306) Splitting SSTables in time,
 deleting and archiving SSTables
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Antti Nissinen created CASSANDRA-10306:
------------------------------------------

             Summary: Splitting SSTables in time, deleting and archiving SS=
Tables
                 Key: CASSANDRA-10306
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10306
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Antti Nissinen
             Fix For: 2.1.x, 2.2.x


This document is a continuation for [CASSANDRA-10195|https://issues.apache.=
org/jira/browse/CASSANDRA-10195] and describes some needs to be able split =
files in time wise as discussed also in [CASSANDRA-8361|https://issues.apac=
he.org/jira/browse/CASSANDRA-8361]. Data model is explained shortly, then t=
he practical issues running Cassandra with time series data and needs for t=
he splitting capabilities.

Data model: (snippet from [CASSANDRA-9644|https://issues.apache.org/jira/br=
owse/CASSANDRA-9644)]
Data is time series data. Data is saved so that one row contains a certain =
time span of data for a given metric ( 20 days in this case). The row key c=
ontains information about the start time of the time span and metrix name. =
Column name gives the offset from the beginning of time span. Column time s=
tamp is set to correspond time stamp when adding together the timestamp fro=
m the row key and the offset (the actual time stamp of data point). Data mo=
del is analog to KairosDB implementation.

In the practical application the data is added to real-time into the column=
 family. While converting from legacy system old data is pre-loaded in time=
ly order by faking the timestamp of the column before starting the real-tim=
e data collection. However, there is intermittently a need to insert also o=
lder data to the database due to the fact that is has not been available in=
 real-time or additional time series are fed in afterward due to unforeseea=
ble needs.=20

Adding old  data simultaneously with real-time data will lead to SSTables t=
hat are containing data from a time period exceeding the length of the comp=
action window (TWCS and DTCS). Therefore SSTables are not behaving in predi=
ctable manner in compaction process.

Tombstones are masking the data from queries but the release of disk space =
requires that SStables containing tombstones would be compacted together wi=
th SSTables having the original data. While using TWCS or DTCS and writing =
tombstones with timestamp corresponding the real time SStables containing t=
he original data will not end up to be compacted with SSTables having the t=
ombstone. Even if writing tombstones by faking the timestamps the SSTable s=
hould be written apart from the on-going real-time data. Otherwise the SSTa=
bles have to be splitted (see later).=20

TTL is a working method to delete data from column family and releasing dis=
k space in a predictable manner. However, setting the correct TTL is not a =
trivial task. Required TTL might change e.g. due to legislation or the cust=
omer would like to have a longer lifetime for the data.=20

The other factor affecting the disk space consumption is the variability of=
 the rate how much data is fed to the column family. In certain troubleshoo=
ting cases the sample rate can be increased ten fold for a large portion of=
 collected time series. This will lead to rapid consumption of disk space a=
nd old data has to be deleted / archived in a such manner that disk space w=
ill be released in a quick and predictable manner.

Losing one or more nodes from the cluster and not having a spare hardware w=
ill also lead to a situation that data from the lost node has to be replica=
ted again for the remaining nodes. This will lead to increased disk space c=
onsumption per node and probably requires some cleaning of older data away =
from the active column family.

All of the above issues could be of course handled just by adding more disk=
 space or nodes to the cluster. In the cloud environment that would a feasi=
ble option. In the application sitting in real hardware in isolated environ=
ment this is not a feasible solution due to practical reasons or due to cos=
ts. Getting new hardware on sites might take a long time e.g. due to custom=
 regulations.

In the application domain (time series data collection) the data is not mod=
ified after inserting to the column family. There will be only read operati=
ons and deletion / archiving of old data based on the TTL or operator actio=
ns.

The above reasoning will lead to following conclusions and proposals.

* TWCS and DTCS (with certain modifications) are leading to a well structur=
ed SSTables where tables are organized in timely manner giving opportunitie=
s to manage available disk capacity on nodes. Recovering from repairs works=
 also (compaction the flood of small SSTables with larger ones).
* Being able to effectively split the SStables along a given time line woul=
d lead to SSTable sets on all nodes that would allow deletion or archiving =
SSTables. What would be the mechanism to inactivate SSTables during deletio=
n / archiving so that nodes don=E2=80=99t start streaming =E2=80=9Cmissing=
=E2=80=9D data between nodes (repairs)?
* Being able to split existing SSTables along multiple timelines determined=
 by TWCS would allow insertion of older data to the column family that woul=
d eventually be compacted in desired manner in correct time window. Origina=
l SSTable would be streamed to several SStables according to time windows. =
In the end empty SSTables would be discarded.
* Splitting action would be a tool to be executed through the nodetool comm=
and when needed.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)