Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4E558180B5 for ; Sun, 13 Dec 2015 02:37:47 +0000 (UTC) Received: (qmail 63785 invoked by uid 500); 13 Dec 2015 02:37:47 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 63750 invoked by uid 500); 13 Dec 2015 02:37:47 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 63739 invoked by uid 99); 13 Dec 2015 02:37:47 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Dec 2015 02:37:47 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id D3DC42C1F79 for ; Sun, 13 Dec 2015 02:37:46 +0000 (UTC) Date: Sun, 13 Dec 2015 02:37:46 +0000 (UTC) From: "Ariel Weisberg (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054761#comment-15054761 ] Ariel Weisberg commented on CASSANDRA-9318: ------------------------------------------- I got two cstar jobs to complete. [This job is set to allow 16 megabytes of transactions per coordinator, and disabled reads until they come back down to 12 megabytes.|http://cstar.datastax.com/graph?command=one_job&stats=d1e720c8-a125-11e5-9051-0256e416528f&metric=op_rate&operation=1_write&smoothing=1&show_aggregates=true&xmin=0&xmax=6664.35&ymin=0&ymax=11883.3] [This job is set to allow 64 megabytes of transactions per coordinator, and disabled reads until they came back down to 60 megabytes.|http://cstar.datastax.com/graph?command=one_job&stats=26853362-a127-11e5-80c2-0256e416528f&metric=op_rate&operation=1_write&smoothing=1&show_aggregates=true&xmin=0&xmax=322.85&ymin=0&ymax=12972.3] The job with 64 megabytes in flight kind of looks like it failed after 300 seconds. I didn't expect the threshold for things to fall apart to be quite that low, but generally speaking yeah more data in flight tends to cause bad things to happen. So why did the second one fall apart? First off mad props to whomever started collecting the GC logs. Lot's of continual full GC at the end. Sure enough the heap is only 1 gigabyte. Are we seriously running all our performance tests with a default heap of 1 gigabyte? I don't think it failed due to in flight requests (only had 32 megabytes in flight). I think it up OOMed due to other heap pressure. For this in-flight request backpressure to work I think we need to include the weight of memtables when making the decision. I am going to bump up the heap and try again to see if I can reduce the impact of other heap pressure to the point that we can start buffering more requests in flight. > Bound the number of in-flight requests at the coordinator > --------------------------------------------------------- > > Key: CASSANDRA-9318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths, Streaming and Messaging > Reporter: Ariel Weisberg > Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x > > > It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes. > An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark. > Need to make sure that disabling read on the client connection won't introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)