cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei Deng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11380) Client visible backpressure mechanism
Date Sat, 26 Mar 2016 00:07:25 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212642#comment-15212642
] 

Wei Deng commented on CASSANDRA-11380:
--------------------------------------

bq. but one simple client mechanism, especially in bulk loading scenarios, is to set a slightly
higher consistency level.

That's exactly based on the load shedding approach mentioned in the first paragraph, and is
not always effective.

> Client visible backpressure mechanism
> -------------------------------------
>
>                 Key: CASSANDRA-11380
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11380
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Coordination
>            Reporter: Wei Deng
>
> Cassandra currently lacks a sophisticated back pressure mechanism to prevent clients
ingesting data at too high throughput. One of the reasons why it hasn't done so is because
of its SEDA (Staged Event Driven Architecture) design. With SEDA, an overloaded thread pool
can drop those droppable messages (in this case, MutationStage can drop mutation or counter
mutation messages) when they exceed the 2-second timeout. This can save the JVM from running
out of memory and crash. However, one downside from this kind of load-shedding based backpressure
approach is that increased number of dropped mutations will increase the chance of inconsistency
among replicas and will likely require more repair (hints can help to some extent, but it's
not designed to cover all inconsistencies); another downside is that excessive writes will
also introduce much more pressure on compaction (especially LCS),  and backlogged compaction
will increase read latency and cause more frequent GC pauses, and depending on the type of
compaction, some backlog can take a long time to clear up even after the write is removed.
It seems that the current load-shedding mechanism is not adequate to address a common bulk
loading scenario, where clients are trying to ingest data at highest throughput possible.
We need a more direct way to tell the client drivers to slow down.
> It appears that HBase had suffered similar situation as discussed in HBASE-5162, and
they introduced some special exception type to tell the client to slow down when a certain
"overloaded" criteria is met. If we can leverage a similar mechanism, our dropped mutation
event can be used to trigger such exceptions to push back on the client; at the same time,
backlogged compaction (when the number of pending compactions exceeds a certain threshold)
can also be used for the push back and this can prevent vicious cycle mentioned in https://issues.apache.org/jira/browse/CASSANDRA-11366?focusedCommentId=15198786&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15198786.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message