cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
Date Fri, 11 Dec 2015 22:45:46 GMT


Ariel Weisberg commented on CASSANDRA-9318:

I tried out limiting based on memory utilization. What I found was that for the small amount
of throughput I can get out of my desktop the 2 second timeout is sufficient to evict and
hint without OOM. If I extend the timeout enough I can get an OOM because eviction doesn't
keep up so that demonstrates that eviction has to take place to avoid OOM.

I see this change as useful in that it places a hard bound on working set size, but not sufficient.

Performance tanks so badly as the heap fills up with requests being timed out that evicting
them is not a problem. If that is the case maybe we should just be evicting them more aggressively
so they don't have an impact on performance. Possibly based on perceived odds of receiving
a response in a reasonable amount of time. It makes sense to me to use hinting as a method
of getting the data off the heap and batching the replication to slow nodes or DCs.

If we start evicting requests for a node maybe we should have an adaptive approach and go
straight to hinting for the slow node for some % of requests. If the non-immediately hinted
requests start succeeding we can gradually increase the % that go straight to the node.

I am trying to think of alternatives that don't end up kicking back an error to the application.
That's still an important capability to have because growing hints forever is not great, but
we can start by ensuring that rest of the cluster can always operate at full speed even if
a node is slow. Separately we can tackle bounding the resource utilization issues that presents.

Operationally how do people feel about having many gigabytes worth of hints to deliver? Is
that useful in that it allows things to continue until the slow node is addressed?

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>                 Key: CASSANDRA-9318
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x, 2.2.x
> It's possible to somewhat bound the amount of load accepted into the cluster by bounding
the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding bytes and requests
and if it reaches a high watermark disable read on client connections until it goes back below
some low watermark.
> Need to make sure that disabling read on the client connection won't introduce other

This message was sent by Atlassian JIRA

View raw message