cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Jirsa (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-8340) Use sstable min timestamp when deciding if an sstable should be included in DTCS compactions
Date Sat, 30 May 2015 02:28:17 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jeff Jirsa updated CASSANDRA-8340:
----------------------------------
    Attachment: 8340.diff

We have a use case where we're unable to rely on the max timestamp because it CAN BE (in some
partitions, but not all) modified to be pathologically high (but never pathologically low),
causing the existing filtering code to work as designed, but not in a way that benefits us.
While we could write a custom DTCS-like replacement, this proposed improvement  has a nice
side effect of also solving our problem, so I've written a simple patch to enable it OPTIONALLY
for users who may prefer to filter on {{sstable.getMinTimestamp()}} rather than {{sstable.getMaxTimestamp()}}

The proposed syntax is fairly straight forward. Standard behavior is unchanged:

{noformat}
create table test (id text primary key) 
with compaction={
'max_sstable_age_days': '1', 
'base_time_seconds': '3600', 
'min_threshold': '4', 'class': 
'DateTieredCompactionStrategy'
} ;
{noformat}

Optional min timestamp is available with a boolean flag:

{noformat}
create table test (id text primary key) 
with compaction={
'max_sstable_age_days': '1', 
'base_time_seconds': '3600', 
'min_threshold': '4', 
'class': 'DateTieredCompactionStrategy', 
'use_min_timestamp' : 'true' 
} ;
{noformat}

This decreases risk for existing users, but adds functionality for people who are migrating
from STCS and/or have other reasons to prefer getMinTimestamp().

What do you guys think, [~krummas] and [~Bj0rn] ?  

> Use sstable min timestamp when deciding if an sstable should be included in DTCS compactions
> --------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8340
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8340
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>            Priority: Minor
>         Attachments: 8340.diff
>
>
> Currently we check how old the newest data (max timestamp) in an sstable is when we check
if it should be compacted.
> If we instead switch to using min timestamp for this we have a pretty clean migration
path from STCS/LCS to DTCS. 
> My thinking is that before migrating, the user does a major compaction, which creates
a huge sstable containing all data, with min timestamp very far back in time, then switching
to DTCS, we will have a big sstable that we never compact (ie, min timestamp of this big sstable
is before max_sstable_age_days), and all newer data will be after that, and that new data
will be properly compacted
> WDYT [~Bj0rn] ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message