Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0766D200D11 for ; Mon, 2 Oct 2017 23:02:04 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 05CFC1609EF; Mon, 2 Oct 2017 21:02:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4B3FA1609C0 for ; Mon, 2 Oct 2017 23:02:03 +0200 (CEST) Received: (qmail 93664 invoked by uid 500); 2 Oct 2017 21:02:02 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 93653 invoked by uid 99); 2 Oct 2017 21:02:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Oct 2017 21:02:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id AC092199FF8 for ; Mon, 2 Oct 2017 21:02:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id w2F5-LAr3UeY for ; Mon, 2 Oct 2017 21:02:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id BB5EF5F6D3 for ; Mon, 2 Oct 2017 21:02:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 53092E04F4 for ; Mon, 2 Oct 2017 21:02:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1418C242B6 for ; Mon, 2 Oct 2017 21:02:00 +0000 (UTC) Date: Mon, 2 Oct 2017 21:02:00 +0000 (UTC) From: "Dan Kinder (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (CASSANDRA-13923) Flushers blocked due to many SSTables MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 02 Oct 2017 21:02:04 -0000 Dan Kinder created CASSANDRA-13923: -------------------------------------- Summary: Flushers blocked due to many SSTables Key: CASSANDRA-13923 URL: https://issues.apache.org/jira/browse/CASSANDRA-13923 Project: Cassandra Issue Type: Bug Components: Compaction, Local Write-Read Paths Environment: Cassandra 3.11.0 Centos 6 (downgraded JNA) 64GB RAM 12-disk JBOD Reporter: Dan Kinder Attachments: cassandra-jstack-readstage.txt, cassandra-jstack.txt This started on the mailing list and I'm not 100% sure of the root cause, feel free to re-title if needed. I just upgraded Cassandra from 2.2.6 to 3.11.0. Within a few hours of serving traffic, thread pools begin to back up and grow pending tasks indefinitely. This happens to multiple different stages (Read, Mutation) and consistently builds pending tasks for MemtablePostFlush and MemtableFlushWriter. Using jstack shows that there is blocking going on when trying to call getCompactionCandidates, which seems to happen on flush. We have fairly large nodes that have ~15,000 SSTables per node, all LCS. I seems like this can cause reads to get blocked because they try to acquire a read lock when calling shouldDefragment. And writes, of course, block once we can't allocate anymore memtables, because flushes are backed up. We did not have this problem in 2.2.6, so it seems like there is some regression causing it to be incredibly slow trying to do calls like getCompactionCandidates that list out the SSTables. In our case this causes nodes to build up pending tasks and simply stop responding to requests. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org For additional commands, e-mail: commits-help@cassandra.apache.org