cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Jirsa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11366) Compaction backlog triggered backpressure to write operations
Date Thu, 17 Mar 2016 05:46:33 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198786#comment-15198786
] 

Jeff Jirsa commented on CASSANDRA-11366:
----------------------------------------

There exist other times when pending compactions will increase (notably bootstrapping / repair,
where streaming - especially in a vnode situation - will create dozens/hundreds/thousands
of sstables. If you apply backpressure and drop mutations inbound in these situations, you'll
end up needing to repair, which will again create tons of sstables, which will create more
backpressure, which could start a vicious cycle.

As an operator, I would strongly prefer the existing behavior to having one-more-source-of-backpressure
to try to reason about - the right answer is for operations teams to monitor pending compactions
and counts of sstables and adjust accordingly (especially since adjust may mean tuning concurrent
compactors or compaction throughput, not throttling writes). 
 

> Compaction backlog triggered backpressure to write operations
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-11366
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11366
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths
>            Reporter: Wei Deng
>
> A general perception about Cassandra is that it is super efficient in taking writes and
can handle very high write throughput, due to its simple and efficient commitlog/memtable
write path. However, one aspect that is often overlooked by new Cassandra user is that when
they ingest a lot of data at the highest throughput possible trying to push the hardware's
limit, as soon as the compaction starts to kick in, a seemingly harmless write workload could
create way more I/O and CPU consumption than their hardware can accommodate, and as a result,
memtable flushes take much longer to finish due to I/O contentions from the compaction, thousands
of SSTables start to show up on data disk because compaction is not able to process them all
in time, and GC pauses are more frequent and impactful because read will have to touch way
more SSTables than ideal situation. Depending on the compaction strategy a Cassandra table
chooses, this kind of overwhelmed and backlogged situation can sometimes take a long time
to clear up (for LeveledCompactionStrategy, or LCS, it could be hours) even after the write
workload is removed.
> Currently write has a back pressure mechanism called load shedding, and it will be triggered
if the MutationStage gets too overwhelmed and a mutation takes more than 2 seconds to get
acknowledgement, in that case, the mutation message will be dropped and a hint will be added
by StorageProxy. However, this mechanism does not take into consideration of compaction backlogs.
As a result, the compaction can be very much behind under heavy write workload and continue
to accumulate more and more pending compactions and unprocessed SSTables, while the writes
are flowing at the same rate as write timeouts are not triggered yet. For LCS and DTCS this
can get out of control and become hard to recover. As a practical workaround, you can monitor
the number of pending compactions, and if it starts going on an upward trend, reduce the write
throughput. This has proved to be effectively. However, these two types of activities are
usually conducted by different teams in an enterprise, so they are not always easy to coordinate.
If there is some back pressure mechanism (to slow down the intake of the writes) that can
be automatically triggered based on how much compaction is behind, it will make DBA's life
easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message