cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7203) Flush (and Compact) High Traffic Partitions Separately
Date Tue, 02 Dec 2014 09:24:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231203#comment-14231203
] 

Benedict commented on CASSANDRA-7203:
-------------------------------------

I was _mostly_ hoping to get your and [~kohlisankalp]'s views on _if those workload skews
occur_. Then we could at some point later get into the nitty gritty of if it would be worth
it :-)

The idea wouldn't really be to special case anything except flush, and to depend on (and implement
after) `improvements we have either envisaged or could later envisage to avoid compacting
sstables with low predicted overlap of partitions. i.e. it would have the potential to improve
the benefit of such schemes, by increasing the number of sstable pairings they can rule out.

> Flush (and Compact) High Traffic Partitions Separately
> ------------------------------------------------------
>
>                 Key: CASSANDRA-7203
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7203
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>              Labels: compaction, performance
>
> An idea possibly worth exploring is the use of streaming count-min sketches to collect
data over the up-time of a server to estimating the velocity of different partitions, so that
high-volume partitions can be flushed separately on the assumption that they will be much
smaller in number, thus reducing write amplification by permitting compaction independently
of any low-velocity data.
> Whilst the idea is reasonably straight forward, it seems that the biggest problem here
will be defining any success metric. Obviously any workload following an exponential/zipf/extreme
distribution is likely to benefit from such an approach, but whether or not that would translate
in real terms is another matter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message