Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CAE07200C2B for ; Thu, 2 Mar 2017 13:36:55 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id C9755160B61; Thu, 2 Mar 2017 12:36:55 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1ACAA160B6F for ; Thu, 2 Mar 2017 13:36:54 +0100 (CET) Received: (qmail 93559 invoked by uid 500); 2 Mar 2017 12:36:54 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 93548 invoked by uid 99); 2 Mar 2017 12:36:54 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Mar 2017 12:36:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id CC452188A04 for ; Thu, 2 Mar 2017 12:36:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.547 X-Spam-Level: X-Spam-Status: No, score=-1.547 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-2.999, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id aqrmIgm-Apyp for ; Thu, 2 Mar 2017 12:36:52 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 47AE65F29C for ; Thu, 2 Mar 2017 12:36:52 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 104CFE0A2B for ; Thu, 2 Mar 2017 12:36:48 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 633F024172 for ; Thu, 2 Mar 2017 12:36:46 +0000 (UTC) Date: Thu, 2 Mar 2017 12:36:46 +0000 (UTC) From: "Christian Esken (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CASSANDRA-13265) Epxiration in OutboundTcpConnection can block the reader Thread MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 02 Mar 2017 12:36:56 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891905#comment-15891905 ] Christian Esken edited comment on CASSANDRA-13265 at 3/2/17 12:36 PM: ---------------------------------------------------------------------- Ariel wrote: {quote} Expiration is based on time. There is no point in attempting expiration again immediately because almost nothing will have expired. It allows one bad connection to consume resources it shouldn't in the form of hijacking a thread to iterate a list. I don't see the downside of switching from a boolean to a long and CASing that instead. If we aren't confident in it we can set a small interval so that it still checks for expiration often although I think that just generates useless work. We can't make timeouts pass faster. {quote} [~aweisberg], I understand that you want to CAS on "lastExpirationTime", right? I am also for doing this. Its fitting better and still keeps the change simple. In that case the Thread should iterate the whole Queue, and not bail out on the first hit. I will change it in the PR. was (Author: cesken): Ariel wrote: {quote} Expiration is based on time. There is no point in attempting expiration again immediately because almost nothing will have expired. It allows one bad connection to consume resources it shouldn't in the form of hijacking a thread to iterate a list. I don't see the downside of switching from a boolean to a long and CASing that instead. If we aren't confident in it we can set a small interval so that it still checks for expiration often although I think that just generates useless work. We can't make timeouts pass faster. {quote} [~aweisberg]: I understand that you want to CAS on "lastExpirationTime", right? I am also for doing this. Its fitting better and still keeps the change simple. In that case the Thread should iterate the whole Queue, and not bail out on the first hit. I will change it in the PR. > Epxiration in OutboundTcpConnection can block the reader Thread > --------------------------------------------------------------- > > Key: CASSANDRA-13265 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13265 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 3.0.9 > Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 1.8.0_112-b15) > Linux 3.16 > Reporter: Christian Esken > Assignee: Christian Esken > Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz > > > I observed that sometimes a single node in a Cassandra cluster fails to communicate to the other nodes. This can happen at any time, during peak load or low load. Restarting that single node from the cluster fixes the issue. > Before going in to details, I want to state that I have analyzed the situation and am already developing a possible fix. Here is the analysis so far: > - A Threaddump in this situation showed 324 Threads in the OutboundTcpConnection class that want to lock the backlog queue for doing expiration. > - A class histogram shows 262508 instances of OutboundTcpConnection$QueuedMessage. > What is the effect of it? As soon as the Cassandra node has reached a certain amount of queued messages, it starts thrashing itself to death. Each of the Thread fully locks the Queue for reading and writing by calling iterator.next(), making the situation worse and worse. > - Writing: Only after 262508 locking operation it can progress with actually writing to the Queue. > - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and fully lock the Queue > This means: Writing blocks the Queue for reading, and readers might even be starved which makes the situation even worse. > ----- > The setup is: > - 3-node cluster > - replication factor 2 > - Consistency LOCAL_ONE > - No remote DC's > - high write throughput (100000 INSERT statements per second and more during peak times). > -- This message was sent by Atlassian JIRA (v6.3.15#6346)