cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
Date Mon, 29 Jun 2015 09:31:05 GMT


Benedict commented on CASSANDRA-9318:

bq. I think you're \[Benedict\] overselling how scary it is to stop reading new requests until
we can free up some memory from MS.

The problem is that freeing up memory can be constrained by one or a handful of _dead_ nodes.
We can not only stop accepting work, but significantly reduce cluster throughput as a result
of a *single* timed-out node. I'm not overselling anything, although I may have a different
risk analysis than you do.

Take a simple mathematical thought experiment: we have a four node cluster (pretty common),
with RF=3, serving 100kop/s per coordinator; these operations in memory occupy around 2K as
a Mutation (again, pretty typical). Ordinary round-trip time is 10ms (also, pretty typical).

So, under normal load we would see around 2Mb of data maintained for our queries in-flight
across the cluster. But now one node goes down. This node is a peer for 3/4 of all writes
to the cluster, so we see 150Mb/s of data accumulate in each coordinator. Our limit is probably
no more than 300Mb (probably lower). Our timeout is 10s. So we now have 8s during which nothing
can be done, across the cluster, due to one node's death. After that 8s has elapsed, we get
another flurry. Then another 8s of nothing. Even with a CL of ONE.

This really is fundamentally opposed to the whole idea of Cassandra, and I cannot emphasize
how much I am against it except as a literal last resort when all other strategies have failed.

bq. Hinting is better than leaving things in an unknown state but it's not something we should
opt users into if we have a better option, since it basically turns the write into CL.ANY.

I was under the impression we had moved to talking about ACK'd writes. I'm not suggesting
we ack with success to the handler. 

What we do with unack'd writes is actually less important, and we have a much freer reign
with. We could throw OE. We could block, as you suggest, since these should be more evenly

However I would prefer we do both, i.e., when we run out of room in the coordinator, we should
look to see if there are any nodes that have well in excess of their fair share of entries
waiting for a response. Let's call these nodes N

# if N=0, we block consumption of new writes, as you propose.
# otherwise, we first evict those that have been ACK'd to the client and can be safely hinted
(and hint them)
# if this isn't enough, we evict handlers that, if all N were to fail, would break the CL
we are waiting on, and we throw OE

step 3 is necessary both for CL.ALL, and the scenario where 2 failing nodes have significant

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>                 Key: CASSANDRA-9318
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x, 2.2.x
> It's possible to somewhat bound the amount of load accepted into the cluster by bounding
the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding bytes and requests
and if it reaches a high watermark disable read on client connections until it goes back below
some low watermark.
> Need to make sure that disabling read on the client connection won't introduce other

This message was sent by Atlassian JIRA

View raw message