cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
Date Mon, 14 Dec 2015 18:14:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056411#comment-15056411
] 

Ariel Weisberg commented on CASSANDRA-9318:
-------------------------------------------

Quick note. 65k mutations pending in the mutation stage. 7 memtables pending flush. [I hooked
memtables pending flush into the backpressure mechanism.|https://github.com/apache/cassandra/commit/494eabf48ab48f1e86c058c0b583166ab39dcc39]
That absolutely wrecked performance as throughput dropped to 0 zero periodically, but throughput
is infinitely higher than when the database hasn't OOMed.

Kicked off a few performance runs to demonstrate what happens when you do have backpressure
and you try various large limits on in flight memtables/requests.

[9318 w/backpressure 64m 8g heap memtables count|http://cstar.datastax.com/tests/id/fa769eec-a283-11e5-bbc9-0256e416528f]
[9318 w/backpressure 1g 8g heap memtables count|http://cstar.datastax.com/tests/id/4c52dd6e-a286-11e5-bbc9-0256e416528f]
[9318 w/backpressure 2g 8g heap memtables count|http://cstar.datastax.com/tests/id/b3d5b470-a286-11e5-bbc9-0256e416528f]

I am setting the point where backpressure turns off to almost the same limit as to when it
turns on. This is smooths out performance just enough for stress to not constantly emit huge
numbers of errors as writes time out because the database stops serving requests for a long
time waiting for a memtable to flush.

With pressure from memtables somewhat accounted for the remaining source of pressure that
can bring down a node is remotely delivered mutations. I can throw those into the calculation
and add a listener that blocks reads from other cluster nodes. It's a nasty thing to do, but
maybe not that different from OOM.

I am going to hack together something to force a node to be slow so I can demonstrate overwhelming
it with remotely delivered mutations first.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster by bounding
the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding bytes and requests
and if it reaches a high watermark disable read on client connections until it goes back below
some low watermark.
> Need to make sure that disabling read on the client connection won't introduce other
issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message