cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7203) Flush (and Compact) High Traffic Partitions Separately
Date Sat, 29 Nov 2014 11:22:12 GMT


Benedict commented on CASSANDRA-7203:

[~jbellis]: Are we sure that's a good policy? It's generally accepted that a lot of work (esp.
that involving people, e.g. Netflix, Apple) follows a zipfian/extreme distribution. If we
can avoid the most voluminous customers from degrading performance for everybody, that's surely
a pretty big win? I'm not suggesting this be attacked immediately, but in the medium-to-long
term it seems like a pretty decent yield - and could be applied on both read and write. If
you have 1% of your data appearing in ~100% of sstables, but the other 99% appearing in only
~1% of your sstables, you're compacting an order of magnitude more often than you might otherwise
need to.

Perhaps [~jasobrown] and [~kohlisankalp] have an idea of how realistic this scenario is?

> Flush (and Compact) High Traffic Partitions Separately
> ------------------------------------------------------
>                 Key: CASSANDRA-7203
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>              Labels: compaction, performance
> An idea possibly worth exploring is the use of streaming count-min sketches to collect
data over the up-time of a server to estimating the velocity of different partitions, so that
high-volume partitions can be flushed separately on the assumption that they will be much
smaller in number, thus reducing write amplification by permitting compaction independently
of any low-velocity data.
> Whilst the idea is reasonably straight forward, it seems that the biggest problem here
will be defining any success metric. Obviously any workload following an exponential/zipf/extreme
distribution is likely to benefit from such an approach, but whether or not that would translate
in real terms is another matter.

This message was sent by Atlassian JIRA

View raw message