cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12071) Regression in flushing throughput under load after CASSANDRA-6696
Date Thu, 23 Jun 2016 03:13:16 GMT


Ariel Weisberg commented on CASSANDRA-12071:

I am seeing this against a single table. I don't think the number of tables matters. The memory
pool is global for the process and shared by all tables.

The parallelism is reduced both for a single table and across tables. The issue is that there
is a single threaded executor kicking off flushes that is waiting on the result of whatever
amount of parallelism is available for a single Memtable flush. [The parallelism for a single
Memtable flush is set to the the # of JBOD disks.|]

> Regression in flushing throughput under load after CASSANDRA-6696
> -----------------------------------------------------------------
>                 Key: CASSANDRA-12071
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Ariel Weisberg
>            Assignee: Marcus Eriksson
> The way flushing used to work is that a ColumnFamilyStore could have multiple Memtables
flushing at once and multiple ColumnFamilyStores could flush at the same time. The way it
works now there can be only a single flush of any ColumnFamilyStore & Memtable running
in the C* process, and the number of threads applied to that flush is bounded by the number
of disks in JBOD.
> This works ok most of the time but occasionally flushing will be a little slower and
ingest will outstrip it and then block on available memory. At this point you see several
second stalls that cause timeouts.
> This is a problem for reasonable configurations that don't use JBOD but have access to
a fast disk that can handle some IO queuing (RAID, SSD).
> You can reproduce on beefy hardware (12 cores 24 threads, 64 gigs of RAM, SSD) if you
unthrottle compaction or set it to something like 64 megabytes/second and run with 8 compaction
threads and stress with the default write workload and a reasonable number of threads. I tested
with 96.
> It started happening after about 60 gigabytes of data was loaded.

This message was sent by Atlassian JIRA

View raw message