Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3C26917BA7 for ; Fri, 8 May 2015 19:40:01 +0000 (UTC) Received: (qmail 54470 invoked by uid 500); 8 May 2015 19:40:01 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 54435 invoked by uid 500); 8 May 2015 19:40:01 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 54424 invoked by uid 99); 8 May 2015 19:40:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 May 2015 19:40:01 +0000 Date: Fri, 8 May 2015 19:40:00 +0000 (UTC) From: "Jonathan Ellis (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535356#comment-14535356 ] Jonathan Ellis commented on CASSANDRA-9318: ------------------------------------------- bq. it sounds like Jonathan is suggesting we simply prune our ExpiringMap based on bytes tracked as well as time? No, I'm suggesting we abort requests more aggressively with OverloadedException *before sending them to replicas*. One place this might make sense is sendToHintedEndpoints, where we already throw OE. Right now we only throw OE once we start writing hints for a node that is in trouble. This doesn't seem to be aggressive enough. (Although, most of our users are on 2.0 where we allowed 8x as many hints in flight before starting to throttle.) So, I am suggesting we also track requests outstanding (perhaps with the ExpiringMap as you suggest) as well and stop accepting requests once we hit a reasonable limit of "you can't possibly process more requests than this in parallel." > The ExpiringMap requests are already "in-flight" and cannot be cancelled, so their effect on other nodes cannot be rescinded, and imposing a limit does not stop us issuing more requests to the nodes in the cluster that are failing to keep up and respond to us. Right, and I'm fine with that. The goal is not to keep the replica completely out of trouble. The goal is to keep the coordinator from falling over from buffering EM and MessagingService entries that it can't drain fast enough. Secondarily, this will help the replica too because our existing load shedding is fine at recovering from temporary spikes in load. But our load shedding isn't good enough to save it when the coordinators keep throwing more at it when it's already overwhelmed. > Bound the number of in-flight requests at the coordinator > --------------------------------------------------------- > > Key: CASSANDRA-9318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 > Project: Cassandra > Issue Type: Improvement > Reporter: Ariel Weisberg > Assignee: Ariel Weisberg > Fix For: 2.1.x > > > It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. > An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. > Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)