Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0BA8817B24 for ; Fri, 11 Sep 2015 12:04:46 +0000 (UTC) Received: (qmail 84660 invoked by uid 500); 11 Sep 2015 12:04:45 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 84627 invoked by uid 500); 11 Sep 2015 12:04:45 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 84613 invoked by uid 99); 11 Sep 2015 12:04:45 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Sep 2015 12:04:45 +0000 Date: Fri, 11 Sep 2015 12:04:45 +0000 (UTC) From: "Antti Nissinen (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (CASSANDRA-10306) Splitting SSTables in time, deleting and archiving SSTables MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Antti Nissinen created CASSANDRA-10306: ------------------------------------------ Summary: Splitting SSTables in time, deleting and archiving SS= Tables Key: CASSANDRA-10306 URL: https://issues.apache.org/jira/browse/CASSANDRA-10306 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Antti Nissinen Fix For: 2.1.x, 2.2.x This document is a continuation for [CASSANDRA-10195|https://issues.apache.= org/jira/browse/CASSANDRA-10195] and describes some needs to be able split = files in time wise as discussed also in [CASSANDRA-8361|https://issues.apac= he.org/jira/browse/CASSANDRA-8361]. Data model is explained shortly, then t= he practical issues running Cassandra with time series data and needs for t= he splitting capabilities. Data model: (snippet from [CASSANDRA-9644|https://issues.apache.org/jira/br= owse/CASSANDRA-9644)] Data is time series data. Data is saved so that one row contains a certain = time span of data for a given metric ( 20 days in this case). The row key c= ontains information about the start time of the time span and metrix name. = Column name gives the offset from the beginning of time span. Column time s= tamp is set to correspond time stamp when adding together the timestamp fro= m the row key and the offset (the actual time stamp of data point). Data mo= del is analog to KairosDB implementation. In the practical application the data is added to real-time into the column= family. While converting from legacy system old data is pre-loaded in time= ly order by faking the timestamp of the column before starting the real-tim= e data collection. However, there is intermittently a need to insert also o= lder data to the database due to the fact that is has not been available in= real-time or additional time series are fed in afterward due to unforeseea= ble needs.=20 Adding old data simultaneously with real-time data will lead to SSTables t= hat are containing data from a time period exceeding the length of the comp= action window (TWCS and DTCS). Therefore SSTables are not behaving in predi= ctable manner in compaction process. Tombstones are masking the data from queries but the release of disk space = requires that SStables containing tombstones would be compacted together wi= th SSTables having the original data. While using TWCS or DTCS and writing = tombstones with timestamp corresponding the real time SStables containing t= he original data will not end up to be compacted with SSTables having the t= ombstone. Even if writing tombstones by faking the timestamps the SSTable s= hould be written apart from the on-going real-time data. Otherwise the SSTa= bles have to be splitted (see later).=20 TTL is a working method to delete data from column family and releasing dis= k space in a predictable manner. However, setting the correct TTL is not a = trivial task. Required TTL might change e.g. due to legislation or the cust= omer would like to have a longer lifetime for the data.=20 The other factor affecting the disk space consumption is the variability of= the rate how much data is fed to the column family. In certain troubleshoo= ting cases the sample rate can be increased ten fold for a large portion of= collected time series. This will lead to rapid consumption of disk space a= nd old data has to be deleted / archived in a such manner that disk space w= ill be released in a quick and predictable manner. Losing one or more nodes from the cluster and not having a spare hardware w= ill also lead to a situation that data from the lost node has to be replica= ted again for the remaining nodes. This will lead to increased disk space c= onsumption per node and probably requires some cleaning of older data away = from the active column family. All of the above issues could be of course handled just by adding more disk= space or nodes to the cluster. In the cloud environment that would a feasi= ble option. In the application sitting in real hardware in isolated environ= ment this is not a feasible solution due to practical reasons or due to cos= ts. Getting new hardware on sites might take a long time e.g. due to custom= regulations. In the application domain (time series data collection) the data is not mod= ified after inserting to the column family. There will be only read operati= ons and deletion / archiving of old data based on the TTL or operator actio= ns. The above reasoning will lead to following conclusions and proposals. * TWCS and DTCS (with certain modifications) are leading to a well structur= ed SSTables where tables are organized in timely manner giving opportunitie= s to manage available disk capacity on nodes. Recovering from repairs works= also (compaction the flood of small SSTables with larger ones). * Being able to effectively split the SStables along a given time line woul= d lead to SSTable sets on all nodes that would allow deletion or archiving = SSTables. What would be the mechanism to inactivate SSTables during deletio= n / archiving so that nodes don=E2=80=99t start streaming =E2=80=9Cmissing= =E2=80=9D data between nodes (repairs)? * Being able to split existing SSTables along multiple timelines determined= by TWCS would allow insertion of older data to the column family that woul= d eventually be compacted in desired manner in correct time window. Origina= l SSTable would be streamed to several SStables according to time windows. = In the end empty SSTables would be discarded. * Splitting action would be a tool to be executed through the nodetool comm= and when needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)