[ https://issues.apache.org/jira/browse/CASSANDRA2156?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=13013867#comment13013867
]
Stu Hood edited comment on CASSANDRA2156 at 3/31/11 10:08 AM:

1. Fixed
2. Added and used {{FileUtils.close(Collection<Closeable>)}}
3. targetBytesPerMS only changes when the number of active threads changes: it leads to nice
(imo) periodic feedback of running compactions in the log when compactions start or finish
4. Assuming compaction multithreading makes it in, throttling should never be disabled...
for someone who really wants to disable it, setting it to a high enough value that it never
kicks in should be sufficient?
5. Maybe... but dynamically adjusting the frequency at which we throttle and update {{bytesRead}}
would probably be better to do in another ticket?

Regarding the approach to setting compaction_throughput_mb_per_sec: each bucket probably contains
{{MIN_THRESHOLD}} times more data than the previous bucket, and needs to be compacted {{1
/ MIN_THRESHOLD}} times as often (see the math in the description). This means that the number
of buckets influences how fast you need to compact, and that each additional bucket adds a
linear amount of necessary throughput (+ 1x your flush rate). Therefore, if you have 15 bucket
levels, and you are flushing {{1 MB/s}}, you need to compact at {{1 MB/s * 15}}.
As an example: with {{MIN_THRESHOLD=2}}, each bucket is twice is large as the previous. Say
that we have 4 levels (buckets of sizes 1, 2, 4, 8) and that we need a compaction in the largest
bucket. The amount of data that needs to be compacted in that bucket will be equal to 1 more
than the sum of the sizes of all the other buckets (1 + 2 + 4 == 8  1). So, ideally we would
be able to compact those 8 units in _exactly_ the time it takes for 1 more unit to be flushed,
and for the compactions of the other buckets to trickle up and refill the largest bucket.
Pheew?
CASSANDRA2171 will allow us to calculate the flush rate, which we can then multiply by the
count of buckets (note... one tiny missing piece is determining how many buckets are "empty":
an empty bucket is not created in the current approach).

> Final question. Would it be better to have fewer parallel compactions
As a base case, with no parallelism at all, you _will_ fall behind on compaction, because
every new bucket is a chance to compact. It's a fundamental question, but I haven't thought
about it... sorry.
was (Author: stuhood):
1. Fixed
2. Added and used {{FileUtils.close(Collection<Closeable>)}}
3. targetBytesPerMS only changes when the number of active threads changes: it leads to nice
(imo) periodic feedback of running compactions in the log when compactions start or finish
4. Assuming compaction multithreading makes it in, throttling should never be disabled...
for someone who really wants to disable it, setting it to a high enough value that it never
kicks in should be sufficient?
5. Maybe... but dynamically adjusting the frequency at which we throttle and update {{bytesRead}}
would probably be better to do in another thread?

Regarding the approach to setting compaction_throughput_mb_per_sec: each bucket probably contains
{{MIN_THRESHOLD}} times more data than the previous bucket, and needs to be compacted {{1
/ MIN_THRESHOLD}} times as often (see the math in the description). This means that the number
of buckets influences how fast you need to compact, and that each additional bucket adds a
linear amount of necessary throughput (+ 1x your flush rate). Therefore, if you have 15 bucket
levels, and you are flushing {{1 MB/s}}, you need to compact at {{1 MB/s * 15}}.
As an example: with {{MIN_THRESHOLD=2}}, each bucket is twice is large as the previous. Say
that we have 4 levels (buckets of sizes 1, 2, 4, 8) and that we need a compaction in the largest
bucket. The amount of data that needs to be compacted in that bucket will be equal to 1 more
than the sum of the sizes of all the other buckets (1 + 2 + 4 == 8  1). So, ideally we would
be able to compact those 8 units in _exactly_ the time it takes for 1 more unit to be flushed,
and for the compactions of the other buckets to trickle up and refill the largest bucket.
Pheew?
CASSANDRA2171 will allow us to calculate the flush rate, which we can then multiply by the
count of buckets (note... one tiny missing piece is determining how many buckets are "empty":
an empty bucket is not created in the current approach).

> Final question. Would it be better to have fewer parallel compactions
As a base case, with no parallelism at all, you _will_ fall behind on compaction, because
every new bucket is a chance to compact. It's a fundamental question, but I haven't thought
about it... sorry.
> Compaction Throttling
> 
>
> Key: CASSANDRA2156
> URL: https://issues.apache.org/jira/browse/CASSANDRA2156
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Stu Hood
> Fix For: 0.8
>
> Attachments: 0005Throttletotalcompactiontoaconfigurablethroughput.txt,
for0.60001Throttlecompactiontoafixedthroughput.txt, for0.60002Makecompactionthrottlingconfigurable.txt
>
>
> Compaction is currently relatively bursty: we compact as fast as we can, and then we
wait for the next compaction to be possible ("hurry up and wait").
> Instead, to properly amortize compaction, you'd like to compact exactly as fast as you
need to to keep the sstable count under control.
> For every new level of compaction, you need to increase the rate that you compact at:
a rule of thumb that we're testing on our clusters is to determine the maximum number of buckets
a node can support (aka, if the 15th bucket holds 750 GB, we're not going to have more than
15 buckets), and then multiply the flush throughput by the number of buckets to get a minimum
compaction throughput to maintain your sstable count.
> Full explanation: for a min compaction threshold of {{T}}, the bucket at level {{N}}
can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of data on disk). Every time
a new unit is added, it has a {{1/SsubN}} chance of causing the bucket at level N to fill.
If the bucket at level N fills, it causes {{SsubN}} units to be compacted. So, for each active
level in your system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any
time a new unit is added.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
