cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6109) Consider coldness in STCS compaction
Date Wed, 09 Oct 2013 17:10:42 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790622#comment-13790622
] 

Tyler Hobbs commented on CASSANDRA-6109:
----------------------------------------

bq. Suppose for instance that I have 11 sstables, one of which has 10M reads recently and
10 of which have 1M reads. If I set my threshold to 25% then nothing gets compacted which
is probably not what we want, since the 10 "cold" sstables collectively represent 50% of the
read activity.

Actually, in this case none of the sstables would be considered cold (assuming they all have
similar key estimates).  The mean reads would be 1.8M, and 0.25 * 1.8M = 0.45M.

I agree that it might be difficult to tune intelligently, though.

bq. analyze hotness globally (per-CF) rather than per-bucket

That seems reasonable to me.

bq. configure the threshold based on hotness percentile (compact me if I am hotter than N%
of my peers)

This has the problem of always ignoring the coldest sstable even when there is little variation
between them.  So if you have four SSTables with 1M, 1M, 1M, and 0.999M reads, the last will
be considered cold and never compacted.

> Consider coldness in STCS compaction
> ------------------------------------
>
>                 Key: CASSANDRA-6109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Tyler Hobbs
>             Fix For: 2.0.2
>
>         Attachments: 6109-v1.patch, 6109-v2.patch
>
>
> I see two options:
> # Don't compact cold sstables at all
> # Compact cold sstables only if there is nothing more important to compact
> The latter is better if you have cold data that may become hot again...  but it's confusing
if you have a workload such that you can't keep up with *all* compaction, but you can keep
up with hot sstable.  (Compaction backlog stat becomes useless since we fall increasingly
behind.)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message