cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
Date Fri, 08 May 2015 19:40:00 GMT


Jonathan Ellis commented on CASSANDRA-9318:

bq. it sounds like Jonathan is suggesting we simply prune our ExpiringMap based on bytes tracked
as well as time?

No, I'm suggesting we abort requests more aggressively with OverloadedException *before sending
them to replicas*.  One place this might make sense is sendToHintedEndpoints, where we already
throw OE.

Right now we only throw OE once we start writing hints for a node that is in trouble.  This
doesn't seem to be aggressive enough.  (Although, most of our users are on 2.0 where we allowed
8x as many hints in flight before starting to throttle.)

So, I am suggesting we also track requests outstanding (perhaps with the ExpiringMap as you
suggest) as well and stop accepting requests once we hit a reasonable limit of "you can't
possibly process more requests than this in parallel."

> The ExpiringMap requests are already "in-flight" and cannot be cancelled, so their effect
on other nodes cannot be rescinded, and imposing a limit does not stop us issuing more requests
to the nodes in the cluster that are failing to keep up and respond to us.

Right, and I'm fine with that.  The goal is not to keep the replica completely out of trouble.
 The goal is to keep the coordinator from falling over from buffering EM and MessagingService
entries that it can't drain fast enough.  Secondarily, this will help the replica too because
our existing load shedding is fine at recovering from temporary spikes in load.  But our load
shedding isn't good enough to save it when the coordinators keep throwing more at it when
it's already overwhelmed.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>                 Key: CASSANDRA-9318
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x
> It's possible to somewhat bound the amount of load accepted into the cluster by bounding
the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding bytes and requests
and if it reaches a high watermark disable read on client connections until it goes back below
some low watermark.
> Need to make sure that disabling read on the client connection won't introduce other

This message was sent by Atlassian JIRA

View raw message