Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 781EA181EC for ; Fri, 15 Jan 2016 17:04:40 +0000 (UTC) Received: (qmail 88087 invoked by uid 500); 15 Jan 2016 17:04:40 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 88053 invoked by uid 500); 15 Jan 2016 17:04:40 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 88036 invoked by uid 99); 15 Jan 2016 17:04:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jan 2016 17:04:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id E01222C1F5C for ; Fri, 15 Jan 2016 17:04:39 +0000 (UTC) Date: Fri, 15 Jan 2016 17:04:39 +0000 (UTC) From: "Wei Deng (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-10306) Splitting SSTables in time, deleting and archiving SSTables MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-10306?page=3Dcom.atl= assian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Deng updated CASSANDRA-10306: --------------------------------- Labels: dtcs (was: ) > Splitting SSTables in time, deleting and archiving SSTables > ----------------------------------------------------------- > > Key: CASSANDRA-10306 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1030= 6 > Project: Cassandra > Issue Type: Improvement > Reporter: Antti Nissinen > Labels: dtcs > > This document is a continuation for [CASSANDRA-10195|https://issues.apach= e.org/jira/browse/CASSANDRA-10195] and describes some needs to be able spli= t files in time wise as discussed also in [CASSANDRA-8361|https://issues.ap= ache.org/jira/browse/CASSANDRA-8361]. Data model is explained shortly, then= the practical issues running Cassandra with time series data and needs for= the splitting capabilities. > Data model: (snippet from [CASSANDRA-9644|https://issues.apache.org/jira/= browse/CASSANDRA-9644)] > Data is time series data. Data is saved so that one row contains a certai= n time span of data for a given metric ( 20 days in this case). The row key= contains information about the start time of the time span and metrix name= . Column name gives the offset from the beginning of time span. Column time= stamp is set to correspond time stamp when adding together the timestamp f= rom the row key and the offset (the actual time stamp of data point). Data = model is analog to KairosDB implementation. > In the practical application the data is added to real-time into the colu= mn family. While converting from legacy system old data is pre-loaded in ti= mely order by faking the timestamp of the column before starting the real-t= ime data collection. However, there is intermittently a need to insert also= older data to the database due to the fact that is has not been available = in real-time or additional time series are fed in afterward due to unforese= eable needs.=20 > Adding old data simultaneously with real-time data will lead to SSTables= that are containing data from a time period exceeding the length of the co= mpaction window (TWCS and DTCS). Therefore SSTables are not behaving in pre= dictable manner in compaction process. > Tombstones are masking the data from queries but the release of disk spac= e requires that SStables containing tombstones would be compacted together = with SSTables having the original data. While using TWCS or DTCS and writin= g tombstones with timestamp corresponding the real time SStables containing= the original data will not end up to be compacted with SSTables having the= tombstone. Even if writing tombstones by faking the timestamps the SSTable= should be written apart from the on-going real-time data. Otherwise the SS= Tables have to be splitted (see later).=20 > TTL is a working method to delete data from column family and releasing d= isk space in a predictable manner. However, setting the correct TTL is not = a trivial task. Required TTL might change e.g. due to legislation or the cu= stomer would like to have a longer lifetime for the data.=20 > The other factor affecting the disk space consumption is the variability = of the rate how much data is fed to the column family. In certain troublesh= ooting cases the sample rate can be increased ten fold for a large portion = of collected time series. This will lead to rapid consumption of disk space= and old data has to be deleted / archived in a such manner that disk space= will be released in a quick and predictable manner. > Losing one or more nodes from the cluster and not having a spare hardware= will also lead to a situation that data from the lost node has to be repli= cated again for the remaining nodes. This will lead to increased disk space= consumption per node and probably requires some cleaning of older data awa= y from the active column family. > All of the above issues could be of course handled just by adding more di= sk space or nodes to the cluster. In the cloud environment that would a fea= sible option. In the application sitting in real hardware in isolated envir= onment this is not a feasible solution due to practical reasons or due to c= osts. Getting new hardware on sites might take a long time e.g. due to cust= om regulations. > In the application domain (time series data collection) the data is not m= odified after inserting to the column family. There will be only read opera= tions and deletion / archiving of old data based on the TTL or operator act= ions. > The above reasoning will lead to following conclusions and proposals. > * TWCS and DTCS (with certain modifications) are leading to a well struct= ured SSTables where tables are organized in timely manner giving opportunit= ies to manage available disk capacity on nodes. Recovering from repairs wor= ks also (compaction the flood of small SSTables with larger ones). > * Being able to effectively split the SStables along a given time line wo= uld lead to SSTable sets on all nodes that would allow deletion or archivin= g SSTables. What would be the mechanism to inactivate SSTables during delet= ion / archiving so that nodes don=E2=80=99t start streaming =E2=80=9Cmissin= g=E2=80=9D data between nodes (repairs)? > * Being able to split existing SSTables along multiple timelines determin= ed by TWCS would allow insertion of older data to the column family that wo= uld eventually be compacted in desired manner in correct time window. Origi= nal SSTable would be streamed to several SStables according to time windows= . In the end empty SSTables would be discarded. > * Splitting action would be a tool to be executed through the nodetool co= mmand when needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)