cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T Jake Luciani (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-11697) Improve Compaction Throughput
Date Wed, 06 Jul 2016 20:48:11 GMT


T Jake Luciani commented on CASSANDRA-11697:

Just a general note about my findings so far:

There seems to be some conflation about the performance of compaction mainly that low IO is
slower than high IO.  Our compaction is primarily CPU bound and since we compress by default
it can appear that compaction is going slow but it's actually doing ~the same rows/sec as
uncompressed. Here is an example of compressed vs uncompressed notice the new log message
from CASSANDRA-10805 shows ~ the same rows/sec.

INFO  20:22:01 Compacted (3206a030-43b7-11e6-bf87-63e169949442) 2 sstables to 
to level=0. 
 782.013MiB to 361.930MiB (~46% of original) in 43,211ms.  
 Read Throughput = 18.097MiB/s, Write Throughput = 8.376MiB/s, 
 Row Throughput = ~454,545/s.  20,000,000 total partitions merged to 10,000,000.  Partition
merge counts were {2:10000000, }

INFO  20:27:49 Compacted (00d23b40-43b8-11e6-ad90-8f9393166cb4) 2 sstables to 
to level=0.  
 123.201MiB to 57.405MiB (~46% of original) in 44,527ms.  Read Throughput = 2.767MiB/s, Write
Throughput = 1.289MiB/s, 
 Row Throughput = ~444,444/s.  20,000,000 total partitions merged to 10,000,000.  Partition
merge counts were {2:10000000, }

*I'd like to start talking about compaction in terms of row throughput vs MB/sec.*

There are some things we can do to make our current compaction faster like CASSANDRA-10309
should give us a ~10% boost but the *bigger* issue and what I think most users are seeing
in the real world is the fact that under load compaction effectively grinds to a halt.  I
can easily produce a 10x drop in throughput by applying load during a regular compaction.
 This to me is the problem we should be focused on since we can see this when looking at sample
user logs.  I have a POC I'm working on to address this that I'll be presenting soon to address

> Improve Compaction Throughput
> -----------------------------
>                 Key: CASSANDRA-11697
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: T Jake Luciani
>            Assignee: T Jake Luciani
> The goal of this ticket is to improve/understand the bottlenecks during compactions.
 At a high level this will involve:
> * A test system for measuring compaction time for different workloads and compaction
> * Profiling and analysis
> * Make improvements
> * Add throughput regression tests so we can track
> We have a lot of random tickets that relate to this so I'll link them to this ticket

This message was sent by Atlassian JIRA

View raw message