cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Liang (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-2735) Timestamp Based Compaction Strategy
Date Thu, 29 Sep 2011 06:20:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117032#comment-13117032
] 

Alan Liang commented on CASSANDRA-2735:
---------------------------------------

We've tested this patch internally and we noticed that this actually resulted in a lot more
compactions  than the SizeTieredCompactionStrategy. The increase in IO was not acceptable
for our use and therefore stopped working on this patch.

Internally, we ended up implementing expiration of sstables within SizeTieredCompactionStrategy.
We've called it SizeTieredExpirableCompactionStrategy. Given a set of all sstables, the compaction
procedure becomes:

1. Expire sstables based on max timestamp of the sstable. Remove expired sstables from the
set.
2. Remove sstables from the set that are >= to a max size
3. Run the SizeTieredCompactionStrategy on the remaining sstables.

The downside with this strategy is that during compaction, newer sstables could be mixed with
older sstables and the resultant compacted sstable gets marked with a max timestamp of the
newer sstable. This means you won't be able to expire the older rows within the sstable until
the entire sstable is to be expired. This problem of compacting really old sstables with newer
sstables is mitigated with a restriction that an sstable is taken out of consideration for
compaction if it reaches a certain max sstable size. This works because older sstables tend
to be larger files.

We found this is currently working for our specific use case of storing timeseries data. I
can post the patch for this SizeTieredExpirableCompactionStrategy if there is interest. I'll
have to rebase it.
                
> Timestamp Based Compaction Strategy
> -----------------------------------
>
>                 Key: CASSANDRA-2735
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Alan Liang
>            Assignee: Alan Liang
>            Priority: Minor
>              Labels: compaction
>         Attachments: 0001-timestamp-bucketed-compaction-strategy-V2.patch, 0001-timestamp-bucketed-compaction-strategy.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the sstables while
satisfying max sstable size, min and max compaction thresholds. It also handles expiration
of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message