cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-4316) Compaction Throttle too bursty with large rows
Date Thu, 10 Jan 2013 00:26:12 GMT


Jonathan Ellis commented on CASSANDRA-4316:

Looks like mayThrottle (both implementations) is missing conversion from bytes -> MB. 
(Actually looking at the RateLimiter creation, it looks like the name {{dataSizeInMB}} is
misleading since it is actually still bytes.)

Does StandaloneScrubber ratelimit?  It probably shouldn't.

Scanner in cleanup compaction needs a withRateLimit.

Is looping over scanner.getCurrentPosition for each row compacted going to eat CPU?  Maybe
every N rows would be better, with N = 1MB / average row size.  Quite possibly it's not actually
a problem and I'm prematurely complexifying.

- "// throttle if needed" comment is redundant
- "maybeThrottle" is a better method name than "mayThrottle"
- May be able to simplify getCompactionRateLimiter by creating a default limiter even if we
are unthrottled (since if we are unthrottled we ignore it anyway), so you don't need to worry
about == null checks

Rest LGTM.  Should we open a new ticket to move FST to RateLimiter and get rid of Throttle

> Compaction Throttle too bursty with large rows
> ----------------------------------------------
>                 Key: CASSANDRA-4316
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Wayne Lewis
>            Assignee: Yuki Morishita
>             Fix For: 1.2.1
>         Attachments: 4316-1.2.txt
> In org.apache.cassandra.db.compaction.CompactionIterable the check for compaction throttling
occurs once every 1000 rows. In our workload this is much too large as we have many large
rows (16 - 100 MB).
> With a 100 MB row, about 100 GB is read (and possibly written) before the compaction
throttle sleeps. This causes bursts of essentially unthrottled compaction IO followed by a
long sleep which yields inconsistence performance and high error rates during the bursts.
> We applied a workaround to check throttle every row which solved our performance and
error issues:
> line 116 in org.apache.cassandra.db.compaction.CompactionIterable:
>                 if ((row++ % 1000) == 0)
> replaced with
>                 if ((row++ % 1) == 0)
> I think the better solution is to calculate how often throttle should be checked based
on the throttle rate to apply sleeps more consistently. E.g. if 16MB/sec is the limit then
check for sleep after every 16MB is read so sleeps are spaced out about every second.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message