Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EB824200CBE for ; Fri, 23 Jun 2017 01:35:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E9262160BF1; Thu, 22 Jun 2017 23:35:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0EB9B160BE7 for ; Fri, 23 Jun 2017 01:35:10 +0200 (CEST) Received: (qmail 88817 invoked by uid 500); 22 Jun 2017 23:35:10 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 88806 invoked by uid 99); 22 Jun 2017 23:35:10 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jun 2017 23:35:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id A300B19207B for ; Thu, 22 Jun 2017 23:35:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 9a2-AfyKUPW9 for ; Thu, 22 Jun 2017 23:35:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 8674D5FC5D for ; Thu, 22 Jun 2017 23:35:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 00C16E0637 for ; Thu, 22 Jun 2017 23:35:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 37DBF21942 for ; Thu, 22 Jun 2017 23:35:02 +0000 (UTC) Date: Thu, 22 Jun 2017 23:35:02 +0000 (UTC) From: "Michael Shuler (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-13538) Cassandra tasks permanently block after the following assertion occurs during compaction: "java.lang.AssertionError: Interval min > max " MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 22 Jun 2017 23:35:12 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-13538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Shuler updated CASSANDRA-13538: --------------------------------------- Fix Version/s: (was: 2.1.18) 2.1.x > Cassandra tasks permanently block after the following assertion occurs during compaction: "java.lang.AssertionError: Interval min > max " > ----------------------------------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-13538 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13538 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: This happens on a 7 node system with 2 data centers. We're using Cassandra version 2.1.15. I upgraded to 2.1.17 and it still occurs. > Reporter: Andy Klages > Fix For: 2.1.x > > Attachments: cassandra.yaml, jstack.out, schema.cql3, system.log, tpstats.out > > > We noticed this problem because the commitlogs proliferate to the point that we eventually run out of disk space. nodetool tpstats shows several of the tasks backed up: > {code} > Pool Name Active Pending Completed Blocked All time blocked > MutationStage 0 0 134335315 0 0 > ReadStage 0 0 643986790 0 0 > RequestResponseStage 0 0 114298 0 0 > ReadRepairStage 0 0 36 0 0 > CounterMutationStage 0 0 0 0 0 > MiscStage 0 0 0 0 0 > AntiEntropySessions 1 1 79357 0 0 > HintedHandoff 0 0 90 0 0 > GossipStage 0 0 6595098 0 0 > CacheCleanupExecutor 0 0 0 0 0 > InternalResponseStage 0 0 1638369 0 0 > CommitLogArchiver 0 0 0 0 0 > CompactionExecutor 2 175 2922542 0 0 > ValidationExecutor 0 0 1465374 0 0 > MigrationStage 1 76 600 0 0 > AntiEntropyStage 1 923 8291098 0 0 > PendingRangeCalculator 0 0 20 0 0 > Sampler 0 0 0 0 0 > MemtableFlushWriter 0 0 53017 0 0 > MemtablePostFlush 1 4584 1545141 0 0 > MemtableReclaimMemory 0 0 70639 0 0 > Native-Transport-Requests 0 0 352559 0 0 > {code} > This all starts after the following exception is raised in Cassandra: > {code} > ERROR [MemtableFlushWriter:2437] 2017-05-15 01:53:23,380 CassandraDaemon.java:231 - Exception in thread Thread[MemtableFlushWriter:2437,5,main] > java.lang.AssertionError: Interval min > max > at org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:249) ~[apache-cassandra-2.1.15.jar:2.1.15] > at org.apache.cassandra.utils.IntervalTree.(IntervalTree.java:72) ~[apache-cassandra-2.1.15.jar:2.1.15] > at org.apache.cassandra.db.DataTracker$SSTableIntervalTree.(DataTracker.java:603) ~[apache-cassandra-2.1.15.jar:2.1.15] > at org.apache.cassandra.db.DataTracker$SSTableIntervalTree.(DataTracker.java:597) ~[apache-cassandra-2.1.15.jar:2.1.15] > at org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:578) ~[apache-cassandra-2.1.15.jar:2.1.15] > at org.apache.cassandra.db.DataTracker$View.replaceFlushed(DataTracker.java:740) ~[apache-cassandra-2.1.15.jar:2.1.15] > at org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:172) ~[apache-cassandra-2.1.15.jar:2.1.15] > at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234) ~[apache-cassandra-2.1.15.jar:2.1.15] > at org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1521) ~[apache-cassandra-2.1.15.jar:2.1.15] > at org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336) ~[apache-cassandra-2.1.15.jar:2.1.15] > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.15.jar:2.1.15] > at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) ~[guava-16.0.jar:na] > at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127) ~[apache-cassandra-2.1.15.jar:2.1.15] > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_121] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_121] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] > {code} > This has only occurred on one of our system tester's setup but with regularity. I couldn't begin to tell you how to reproduce it. We have many systems deployed only one this one setup encounters this issue. I have included the jstack output, config file, log file, and schema. I even have a heap dump available if needed. After looking at the heap dump, the best I can tell is that the assertion failure left a lock (i.e. latch) in a locked state that then causes a backlog of pending tasks. > I'm hoping this assertion will mean something to the Cassandra development community and perhaps fixed in a newer release. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org For additional commands, e-mail: commits-help@cassandra.apache.org