cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
Date Wed, 13 Jul 2016 01:45:21 GMT


Stefania commented on CASSANDRA-9318:

bq. Right, but there isn't much we can do without way more invasive changes. Anyway, I don't
think that's actually a problem, as if the coordinator is overloaded we'll end up generating
too many hints and fail with OverloadedException (this time with its original meaning), so
we should be covered.

I tend to agree that it is an approximation we can live with; I also would rather not change
the lower levels of messaging service for this.

bq. Does it mean we should advance the protocol version in this issue, or delegate to a new

We have a number of issues waiting for protocol V5, they are labeled as {{protocolv5}}. Either
we make this issue dependent on V5 as well or, since we are committing this as disabled, we
delegate to a new issue that is dependent on V5.

bq. Do you see any complexity I'm missing there?

A new flag would involve a new version and it would need to be handled during rolling upgrades.
Even if on its own it is not too complex, the system in its entirety becomes even more complex
(different versions, compression, cross-node-timeouts, some verbs are droppable, others aren't
and the list goes on). Unless it solves a problem, I don't think we should consider it; and
we are saying in other parts of this conversation that hints are no longer a problem. 

bq. as the advantage would be increased consistency at the expense of more resource consumption,

We don't increase consistency if the client has been told the mutation failed IMO. If we are
instead referring to replicas that were out of the CL pool and temporarily overloaded, I think
they are better off dropping mutations and handling them later on through hints. Basically,
I see dropping mutations replica side as a self defense mechanism for replicas, I don't think
we should remove it, rather we should focus on a backpressure strategy such that replicas
don't need to drop mutations. Also, for the time being, I'd rather focus on the major issue,
which is that we haven't reached consensus on how to apply backpressure yet, and propose this
new idea in a follow up ticket if backpressure is successful.

bq. These are valid concerns of course, and given similar concerns from Jonathan Ellis, I'm
working on some changes to avoid write timeouts due to healthy replicas unnaturally throttled
by unhealthy ones, and depending on Jonathan Ellis answer to my last comment above, maybe
only actually back-pressure if the CL is not met.

OK, so we are basically trying to address the 3 scenarios by throttling/failing only if the
system as a whole cannot handle the mutations (that is at least CL replicas are slow/overloaded)
whereas if less than CL replicas are slow/overloaded, those replicas get hinted?

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>                 Key: CASSANDRA-9318
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Sergio Bossa
>         Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, limit.btm,
> It's possible to somewhat bound the amount of load accepted into the cluster by bounding
the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding bytes and requests
and if it reaches a high watermark disable read on client connections until it goes back below
some low watermark.
> Need to make sure that disabling read on the client connection won't introduce other

This message was sent by Atlassian JIRA

View raw message