cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Antti Nissinen (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10306) Splitting SSTables in time, deleting and archiving SSTables
Date Fri, 09 Oct 2015 13:57:27 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950417#comment-14950417
] 

Antti Nissinen edited comment on CASSANDRA-10306 at 10/9/15 1:57 PM:
---------------------------------------------------------------------

The idea is following:
- let's keep the TTL as it is currently so that compactions (DTCS and TWCS) can drop SSTables
that are fully expired. Splitting the SSTables in time manner in compactions (like you proposed)
will make sure that there won't be any SSTables that are covering a large time span that won't
be dropped.(i.e.having a large number of old data points and few points from the near history).

- Some times there is a need to clean up quickly and effectively data from column family so
that we will give a time line and all the data will be deleted or archived to different media
behind that time (i.e. remove all data that is older than given time stamp, age is considered
from the timestamp of the column). That would require splitting all SSTables that are having
data on both sides of the timeline. If the compaction has been working as expected there is
probably only a couple of SStables to be splitted. 
- If all SSTables for given column family on each node would be splitted according to the
given time line and SStables could be "inactivated" from the active SSTable set then the SSTables
could be removed / moved to somewhere else and the repair operations would not start the replicate
missing data during the move operation while nodes are out of sync.

That would actually correspond the temporary change of the global TTL as you said. TTL is
probably saved as an absolute time stamp in the columns so that algorithm should use some
kind of offset values to lead to desired time line for deletion / archiving.



was (Author: anissinen):
The idea is following:
- let's keep the TTL as it is currently so that compactions (DTCS and TWCS) can drop SSTables
that are fully expired. Splitting the SSTables in time manner in compactions (like you proposed)
will make sure that there won't be any SSTables that are covering a large time span that won't
be dropped.(i.e.having a large number of old data points and few points from the near history).

- Some times there is a need to clean up quickly and effectively data from column family so
that we will give a time line and all the data will be deleted or archived to different media
behind that time (i.e. remove all data that is older than given time stamp, age is considered
from the timestamp of the column). That would require splitting all SSTables that are having
data on both sides of the timeline. If the compaction has been working as expected there is
probably only a couple of SStables to be splitted. 
- If all SSTables for given column family on each node would be splitted according to the
given time line and SStables could be "inactivated" from the active SSTable set then the SSTables
could be removed / moved to somewhere else and the repair operations would not start the replicate
missing data during the move operation while nodes are out of sync.



> Splitting SSTables in time, deleting and archiving SSTables
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-10306
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10306
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Antti Nissinen
>
> This document is a continuation for [CASSANDRA-10195|https://issues.apache.org/jira/browse/CASSANDRA-10195]
and describes some needs to be able split files in time wise as discussed also in [CASSANDRA-8361|https://issues.apache.org/jira/browse/CASSANDRA-8361].
Data model is explained shortly, then the practical issues running Cassandra with time series
data and needs for the splitting capabilities.
> Data model: (snippet from [CASSANDRA-9644|https://issues.apache.org/jira/browse/CASSANDRA-9644)]
> Data is time series data. Data is saved so that one row contains a certain time span
of data for a given metric ( 20 days in this case). The row key contains information about
the start time of the time span and metrix name. Column name gives the offset from the beginning
of time span. Column time stamp is set to correspond time stamp when adding together the timestamp
from the row key and the offset (the actual time stamp of data point). Data model is analog
to KairosDB implementation.
> In the practical application the data is added to real-time into the column family. While
converting from legacy system old data is pre-loaded in timely order by faking the timestamp
of the column before starting the real-time data collection. However, there is intermittently
a need to insert also older data to the database due to the fact that is has not been available
in real-time or additional time series are fed in afterward due to unforeseeable needs. 
> Adding old  data simultaneously with real-time data will lead to SSTables that are containing
data from a time period exceeding the length of the compaction window (TWCS and DTCS). Therefore
SSTables are not behaving in predictable manner in compaction process.
> Tombstones are masking the data from queries but the release of disk space requires that
SStables containing tombstones would be compacted together with SSTables having the original
data. While using TWCS or DTCS and writing tombstones with timestamp corresponding the real
time SStables containing the original data will not end up to be compacted with SSTables having
the tombstone. Even if writing tombstones by faking the timestamps the SSTable should be written
apart from the on-going real-time data. Otherwise the SSTables have to be splitted (see later).

> TTL is a working method to delete data from column family and releasing disk space in
a predictable manner. However, setting the correct TTL is not a trivial task. Required TTL
might change e.g. due to legislation or the customer would like to have a longer lifetime
for the data. 
> The other factor affecting the disk space consumption is the variability of the rate
how much data is fed to the column family. In certain troubleshooting cases the sample rate
can be increased ten fold for a large portion of collected time series. This will lead to
rapid consumption of disk space and old data has to be deleted / archived in a such manner
that disk space will be released in a quick and predictable manner.
> Losing one or more nodes from the cluster and not having a spare hardware will also lead
to a situation that data from the lost node has to be replicated again for the remaining nodes.
This will lead to increased disk space consumption per node and probably requires some cleaning
of older data away from the active column family.
> All of the above issues could be of course handled just by adding more disk space or
nodes to the cluster. In the cloud environment that would a feasible option. In the application
sitting in real hardware in isolated environment this is not a feasible solution due to practical
reasons or due to costs. Getting new hardware on sites might take a long time e.g. due to
custom regulations.
> In the application domain (time series data collection) the data is not modified after
inserting to the column family. There will be only read operations and deletion / archiving
of old data based on the TTL or operator actions.
> The above reasoning will lead to following conclusions and proposals.
> * TWCS and DTCS (with certain modifications) are leading to a well structured SSTables
where tables are organized in timely manner giving opportunities to manage available disk
capacity on nodes. Recovering from repairs works also (compaction the flood of small SSTables
with larger ones).
> * Being able to effectively split the SStables along a given time line would lead to
SSTable sets on all nodes that would allow deletion or archiving SSTables. What would be the
mechanism to inactivate SSTables during deletion / archiving so that nodes don’t start streaming
“missing” data between nodes (repairs)?
> * Being able to split existing SSTables along multiple timelines determined by TWCS would
allow insertion of older data to the column family that would eventually be compacted in desired
manner in correct time window. Original SSTable would be streamed to several SStables according
to time windows. In the end empty SSTables would be discarded.
> * Splitting action would be a tool to be executed through the nodetool command when needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message