Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9559D200B9C for ; Mon, 10 Oct 2016 22:12:27 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 94088160AE1; Mon, 10 Oct 2016 20:12:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A5D52160AD1 for ; Mon, 10 Oct 2016 22:12:26 +0200 (CEST) Received: (qmail 70140 invoked by uid 500); 10 Oct 2016 20:12:20 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 69882 invoked by uid 99); 10 Oct 2016 20:12:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Oct 2016 20:12:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 815602C2A66 for ; Mon, 10 Oct 2016 20:12:20 +0000 (UTC) Date: Mon, 10 Oct 2016 20:12:20 +0000 (UTC) From: "Tom van der Woerdt (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-12764) Compaction performance issues with many sstables, during transaction commit phase MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 10 Oct 2016 20:12:27 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-12764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563347#comment-15563347 ] Tom van der Woerdt commented on CASSANDRA-12764: ------------------------------------------------ Oh, nice to finally link the IRC name to the Jira name :) Yes, it was a lot faster. Here's a graph showing what happened the last four days: https://i.imgur.com/AdWCCrR.png (graphing inode usage, divide by 8 for sstable count) The red line is the node that started the mess. A botched repair[1] caused a nice 100k sstables. This was noticed, and cleaned up. Sadly it had already synced those 100k sstables to other nodes, which properly started compacting the large amounts of files away. But then the regular automation jobs started a repair on the node I wiped, streaming all the files all over the place :( Sadly I was unaware of this until it was too late, and suddenly a lot of nodes on the cluster had 100k sstables :) The sstable count was slowly going down (very, very slowly) but I figured I'd hop on IRC where [~jjirsa] and [~brandon.williams] helped find a workaround (the table move). I applied it to the most broken node first. On the graph it's the red line, look for the slope at the 10/10 boundary. This morning my script broke and it did the final sstables the slow route, but it finished and as you can see the scripted version is much faster than just letting compaction run. I'm in the progress of applying it to the two most broken nodes now, and will let the others just finish. Anyway, that's the story of how this happened, which was totally my fault :) Now I'm just hoping that my mistake can lead to improvements in compaction performance. Tom [1]: subrange repair (similar to BrianGallew's code) on a LCS table, with 256 vnodes, and most data not passing validation. > Compaction performance issues with many sstables, during transaction commit phase > --------------------------------------------------------------------------------- > > Key: CASSANDRA-12764 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12764 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Reporter: Tom van der Woerdt > Labels: lcs > > An issue with a script flooded my cluster with sstables. There is now a table with 100k sstables, all on the order of KBytes, and it's taking a long time (ETA 20 days) to compact, even though the table is only ~30GB. > Stack trace : > {noformat} > "CompactionExecutor:308" #7541 daemon prio=1 os_prio=4 tid=0x00007fa22af35400 nid=0x41eb runnable [0x00007fdbea48d000] > java.lang.Thread.State: RUNNABLE > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:360) > at java.util.TimSort.sort(TimSort.java:220) > at java.util.Arrays.sort(Arrays.java:1438) > at com.google.common.collect.Ordering.sortedCopy(Ordering.java:817) > at org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:209) > at org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:211) > at org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:211) > at org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:211) > at org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:211) > at org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:211) > at org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:211) > at org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:211) > at org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:210) > at org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:210) > at org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:210) > at org.apache.cassandra.utils.IntervalTree$IntervalNode.(IntervalTree.java:210) > at org.apache.cassandra.utils.IntervalTree.(IntervalTree.java:50) > at org.apache.cassandra.db.lifecycle.SSTableIntervalTree.(SSTableIntervalTree.java:40) > at org.apache.cassandra.db.lifecycle.SSTableIntervalTree.build(SSTableIntervalTree.java:50) > at org.apache.cassandra.db.lifecycle.View$4.apply(View.java:288) > at org.apache.cassandra.db.lifecycle.View$4.apply(View.java:283) > at com.google.common.base.Functions$FunctionComposition.apply(Functions.java:216) > at org.apache.cassandra.db.lifecycle.Tracker.apply(Tracker.java:128) > at org.apache.cassandra.db.lifecycle.Tracker.apply(Tracker.java:101) > at org.apache.cassandra.db.lifecycle.LifecycleTransaction.checkpoint(LifecycleTransaction.java:307) > at org.apache.cassandra.db.lifecycle.LifecycleTransaction.checkpoint(LifecycleTransaction.java:288) > at org.apache.cassandra.io.sstable.SSTableRewriter.doPrepare(SSTableRewriter.java:368) > at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173) > at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.doPrepare(CompactionAwareWriter.java:84) > at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173) > at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish(Transactional.java:184) > at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.finish(CompactionAwareWriter.java:94) > at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:194) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78) > at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) > at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:263) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > IntervalTree shows in a lot of stack traces I've taken on several nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)