cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-12071) Regression in flushing throughput under load after CASSANDRA-6696
Date Wed, 22 Jun 2016 20:49:16 GMT
Ariel Weisberg created CASSANDRA-12071:

             Summary: Regression in flushing throughput under load after CASSANDRA-6696
                 Key: CASSANDRA-12071
             Project: Cassandra
          Issue Type: Bug
          Components: Local Write-Read Paths
            Reporter: Ariel Weisberg

The way flushing used to work is that a ColumnFamilyStore could have multiple memtables flushing
at once. The way it works now there can be only a single flush of any memtable running in
the C* process, and the number of threads applied to that flush is bounded by the number of
disks in JBOD.

This works ok most of the time but occasionally flushing will be a little slower and ingest
will outstrip it and then block on available memory. At this point you see several second
stalls that cause timeouts.

This is a problem for reasonable configurations that don't use JBOD but have access to a fast
disk that can handle some IO queuing (RAID, SSD).

You can reproduce on beefy hardware (12 cores 24 threads, 64 gigs of RAM, SSD) if you unthrottle
compaction or set it to something like 64 megabytes/second and run with 8 compaction threads
and stress with the default write workload and a reasonable number of threads. I tested with

It started happening after about 60 gigabytes of data was loaded.

This message was sent by Atlassian JIRA

View raw message