Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Mon, 10 Oct 2016 20:12:20 +0000 (UTC)
From: "Tom van der Woerdt (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.13010882.1476054088000.786315.1476130340526@Atlassian.JIRA>
In-Reply-To: <JIRA.13010882.1476054088000@Atlassian.JIRA>
References: <JIRA.13010882.1476054088000@Atlassian.JIRA> <JIRA.13010882.1476054088911@arcas>
Subject: [jira] [Commented] (CASSANDRA-12764) Compaction performance issues
 with many sstables, during transaction commit phase
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Mon, 10 Oct 2016 20:12:27 -0000


    [ https://issues.apache.org/jira/browse/CASSANDRA-12764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563347#comment-15563347 ] 

Tom van der Woerdt commented on CASSANDRA-12764:
------------------------------------------------

Oh, nice to finally link the IRC name to the Jira name :)

Yes, it was a lot faster. Here's a graph showing what happened the last four days: https://i.imgur.com/AdWCCrR.png (graphing inode usage, divide by 8 for sstable count)

The red line is the node that started the mess. A botched repair[1] caused a nice 100k sstables. This was noticed, and cleaned up.

Sadly it had already synced those 100k sstables to other nodes, which properly started compacting the large amounts of files away. But then the regular automation jobs started a repair on the node I wiped, streaming all the files all over the place :( Sadly I was unaware of this until it was too late, and suddenly a lot of nodes on the cluster had 100k sstables :)

The sstable count was slowly going down (very, very slowly) but I figured I'd hop on IRC where [~jjirsa] and [~brandon.williams] helped find a workaround (the table move). I applied it to the most broken node first. On the graph it's the red line, look for the slope at the 10/10 boundary. This morning my script broke and it did the final sstables the slow route, but it finished and as you can see the scripted version is much faster than just letting compaction run. I'm in the progress of applying it to the two most broken nodes now, and will let the others just finish.

Anyway, that's the story of how this happened, which was totally my fault :) Now I'm just hoping that my mistake can lead to improvements in compaction performance.

Tom


[1]: subrange repair (similar to BrianGallew's code) on a LCS table, with 256 vnodes, and most data not passing validation.

> Compaction performance issues with many sstables, during transaction commit phase
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12764
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12764
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>            Reporter: Tom van der Woerdt
>              Labels: lcs
>
> An issue with a script flooded my cluster with sstables. There is now a table with 100k sstables, all on the order of KBytes, and it's taking a long time (ETA 20 days) to compact, even though the table is only ~30GB.
> Stack trace :
> {noformat}
> "CompactionExecutor:308" #7541 daemon prio=1 os_prio=4 tid=0x00007fa22af35400 nid=0x41eb runnable [0x00007fdbea48d000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.util.TimSort.countRunAndMakeAscending(TimSort.java:360)
> 	at java.util.TimSort.sort(TimSort.java:220)
> 	at java.util.Arrays.sort(Arrays.java:1438)
> 	at com.google.common.collect.Ordering.sortedCopy(Ordering.java:817)
> 	at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:209)
> 	at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:211)
> 	at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:211)
> 	at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:211)
> 	at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:211)
> 	at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:211)
> 	at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:211)
> 	at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:211)
> 	at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:210)
> 	at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:210)
> 	at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:210)
> 	at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:210)
> 	at org.apache.cassandra.utils.IntervalTree.<init>(IntervalTree.java:50)
> 	at org.apache.cassandra.db.lifecycle.SSTableIntervalTree.<init>(SSTableIntervalTree.java:40)
> 	at org.apache.cassandra.db.lifecycle.SSTableIntervalTree.build(SSTableIntervalTree.java:50)
> 	at org.apache.cassandra.db.lifecycle.View$4.apply(View.java:288)
> 	at org.apache.cassandra.db.lifecycle.View$4.apply(View.java:283)
> 	at com.google.common.base.Functions$FunctionComposition.apply(Functions.java:216)
> 	at org.apache.cassandra.db.lifecycle.Tracker.apply(Tracker.java:128)
> 	at org.apache.cassandra.db.lifecycle.Tracker.apply(Tracker.java:101)
> 	at org.apache.cassandra.db.lifecycle.LifecycleTransaction.checkpoint(LifecycleTransaction.java:307)
> 	at org.apache.cassandra.db.lifecycle.LifecycleTransaction.checkpoint(LifecycleTransaction.java:288)
> 	at org.apache.cassandra.io.sstable.SSTableRewriter.doPrepare(SSTableRewriter.java:368)
> 	at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
> 	at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.doPrepare(CompactionAwareWriter.java:84)
> 	at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
> 	at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish(Transactional.java:184)
> 	at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.finish(CompactionAwareWriter.java:94)
> 	at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:194)
> 	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> 	at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78)
> 	at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
> 	at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:263)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> IntervalTree shows in a lot of stack traces I've taken on several nodes.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)