Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 066DC18CDA for ; Thu, 20 Aug 2015 12:34:46 +0000 (UTC) Received: (qmail 17066 invoked by uid 500); 20 Aug 2015 12:34:45 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 17030 invoked by uid 500); 20 Aug 2015 12:34:45 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 17014 invoked by uid 99); 20 Aug 2015 12:34:45 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2015 12:34:45 +0000 Date: Thu, 20 Aug 2015 12:34:45 +0000 (UTC) From: "A Markov (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (CASSANDRA-10138) Millions of compaction tasks on empty DB MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 A Markov created CASSANDRA-10138: ------------------------------------ Summary: Millions of compaction tasks on empty DB Key: CASSANDRA-10138 URL: https://issues.apache.org/jira/browse/CASSANDRA-10138 Project: Cassandra Issue Type: Bug Environment: CentOS 6.5 and Cassandra 2.1.8 Reporter: A Markov Fresh installation of 2.1.8 Cassandra with no data in the database except systems tables becomes unresponsive after about 5-10 minutes from the start. Initially problem was discovered on empty cluster of 12 nodes because of the creation schema error - script was exiting by timeout giving an error. Analysis of log files showed that nodes were constantly reported as DOWN and then after some period of time UP. That was reported for multiple nodes. Verification of the system.log file showed that nodes constantly perform GC and while doing that all cores of the system were 100% busy which caused node disconnect after some time. Further analysis with nodetool (tpstats option) showed us that just after 10 minutes since clean node restart node completed more then 47M compaction tasks and had more then 12M pending. Here is example of the output: nodetool tpstats Pool Name Active Pending Completed Blocked All time blocked CounterMutationStage 0 0 0 0 0 ReadStage 0 0 0 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 0 257 0 0 ReadRepairStage 0 0 0 0 0 GossipStage 0 0 0 0 0 CacheCleanupExecutor 0 0 0 0 0 MigrationStage 0 0 0 0 0 ValidationExecutor 0 0 0 0 0 Sampler 0 0 0 0 0 MemtableReclaimMemory 0 0 8 0 0 InternalResponseStage 0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 MiscStage 0 0 0 0 0 CommitLogArchiver 0 0 0 0 0 MemtableFlushWriter 0 0 8 0 0 PendingRangeCalculator 0 0 1 0 0 MemtablePostFlush 0 0 44 0 0 CompactionExecutor 0 12996398 47578625 0 0 AntiEntropySessions 0 0 0 0 0 HintedHandoff 0 1 2 0 0 I am repeating myself but that was on TOTALLY EMPTY DB after 10 minutes since cassandra was started. I was able to repeateadly reproduce same issue and behaviour with single cassandra instance. Issue was persistent after I did full cassandra wipe out and reinstall from repository. I discovered that issue dissipaters if I execute nodetool disableautocompaction in that case system quickly (in a matter of 20-30 seconds) goes though all pending tasks and becomes idle. If I enable autocompaction again in about 1 minute it jumps to millions of pending tasks again. I verified it on the save server with version of Cassandra 2.1.6 and issue was not present. logs file do not show any ERROR messages. There were only warnings about GC events that were taking too long. -- This message was sent by Atlassian JIRA (v6.3.4#6332)