cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph Lynch (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables
Date Fri, 16 Nov 2018 05:41:00 GMT


Joseph Lynch updated CASSANDRA-14898:
    Labels: Performance low-hanging-fruit  (was: )

> Key cache loading is very slow when there are many SSTables
> -----------------------------------------------------------
>                 Key: CASSANDRA-14898
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>         Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of RAM, loading
about 8MB of KeyCache with 10k keys in it.
>            Reporter: Joseph Lynch
>            Priority: Minor
>              Labels: Performance, low-hanging-fruit
>         Attachments: key_cache_load_slow.svg
> While dealing with a production issue today where some 3.0.17 nodes had close to ~8k
sstables on disk due to excessive write pressure, we had a few nodes crash due to OOM and
then they took close to 17 minutes to load the key cache and recover. This excessive key cache
load significantly increased the duration of the outage (to mitigate we just removed the saved
key cache files). For example here is one example taking 17 minutes to load 10k keys, or about
10 keys per second (which is ... very slow):
> {noformat}
> INFO  [pool-3-thread-1] 2018-11-15 21:50:21,885 - reading saved
cache /mnt/data/cassandra/saved_caches/KeyCache-d.db
> INFO  [pool-3-thread-1] 2018-11-15 22:07:16,490 - Completed
loading (1014606 ms; 10103 keys) KeyCache cache
> {noformat}
> I've witnessed similar behavior in the past with large LCS clusters, and indeed it appears
that any time the number of sstables is large, KeyCache loading takes a _really_ long time.
Today I got a flame graph and I believe that I found the issue and I think it's reasonably
easy to fix. From what I can tell the {{KeyCacheSerializer::deserialize}} [method |] which
is called for every key is linear in the number of sstables due to the [call|]
to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} [here|].
The {{View::select}} call is linear in the number of sstables and causes a _lot_ of {{HashSet}}
when the number of sstables is much greater than 16 (the default size of the backing {{HashMap}}).
> As we see in the attached flamegraph we spend 50% of our CPU time in these {{getSSTable}}
calls, of which 36% is spent adding sstables to the HashSet in {{View::select}} and 17% is
spent just iterating the sstables in the first place. A full 16% of CPU time is spent _just
resizing the HashMap_. Then another 4% is spend calling {{CacheService::findDesc}} which
does [a linear search|]
for the sstable generation.
> I believe that this affects at least Cassandra 3.0.17 and trunk, and could be pretty
easily fixed by either caching the getSSTables call or at the very least pre-sizing the {{HashSet}}
in {{View::select}} to be the size of the sstables map.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message