cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (Commented) (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-3442) TTL histogram for sstable metadata
Date Mon, 21 Nov 2011 15:40:52 GMT


Jonathan Ellis commented on CASSANDRA-3442:

Should we do single-sstable compactions *after* the bucket compactions?  Doing them first
means we might compact them twice, when the bucket-based compaction would have been adequate.

It looks like this will never stop recompacting sstables with high expiring column counts,
until they finally expire and are expunged.  I think we need to address this somehow, possibly
by waiting until some fraction of gc_grace_seconds has elapsed since sstable creation (which
we can just get from mtime).

If we can reasonably test this in CompactionsTest I'd like to add that.
> TTL histogram for sstable metadata
> ----------------------------------
>                 Key: CASSANDRA-3442
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: compaction
>             Fix For: 1.1
>         Attachments: 3442.txt
> Under size-tiered compaction, you can generate large sstables that compact infrequently.
 With expiring columns mixed in, we could waste a lot of space in this situation.
> If we kept a TTL EstimatedHistogram in the sstable metadata, we could do a single-sstable
compaction aginst sstables with over 20% (?) expired data.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message