cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Branson (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6345) Endpoint cache invalidation causes CPU spike (on vnode rings?)
Date Mon, 18 Nov 2013 18:53:23 GMT


Rick Branson commented on CASSANDRA-6345:

I like the simpler approach. I still think the callbacks for invalidation are asking for it
;) I also think perhaps the stampede lock should be more explicit than a synchronized lock
on "this" to prevent unintended blocking from future modifications.

Either way, I think the only material concern I have is the order that TokenMetadata changes
get applied to the caches in AbstractReplicationStrategy instances. Shouldn't the invalidation
take place on all threads in all instances of AbstractReplicationStrategy before returning
from an endpoint-mutating write operation in TokenMetadata? It seems as if just setting the
cache to empty would allow a period of time where TokenMetadata write methods had returned
but not all threads have seen the mutation yet because they are still holding onto the old
clone of TM. This might be alright though, I'm not sure. Thoughts?

> Endpoint cache invalidation causes CPU spike (on vnode rings?)
> --------------------------------------------------------------
>                 Key: CASSANDRA-6345
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: 30 nodes total, 2 DCs
> Cassandra 1.2.11
> vnodes enabled (256 per node)
>            Reporter: Rick Branson
>            Assignee: Jonathan Ellis
>             Fix For: 1.2.12, 2.0.3
>         Attachments: 6345-rbranson-v2.txt, 6345-rbranson.txt, 6345-v2.txt, 6345-v3.txt,
6345.txt, half-way-thru-6345-rbranson-patch-applied.png
> We've observed that events which cause invalidation of the endpoint cache (update keyspace,
add/remove nodes, etc) in AbstractReplicationStrategy result in several seconds of thundering
herd behavior on the entire cluster. 
> A thread dump shows over a hundred threads (I stopped counting at that point) with a
backtrace like this:
>         at
>         at org.apache.cassandra.locator.TokenMetadata$
>         at org.apache.cassandra.locator.TokenMetadata$
>         at java.util.TreeMap.getEntryUsingComparator(
>         at java.util.TreeMap.getEntry(
>         at java.util.TreeMap.get(
>         at
>         at
>         at
>         at
>         at
>         at org.apache.cassandra.utils.SortedBiMultiValMap.create(
>         at org.apache.cassandra.locator.TokenMetadata.cloneOnlyTokenMap(
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(
>         at org.apache.cassandra.service.StorageService.getNaturalEndpoints(
>         at org.apache.cassandra.service.StorageProxy.performWrite(
> It looks like there's a large amount of cost in the TokenMetadata.cloneOnlyTokenMap that
AbstractReplicationStrategy.getNaturalEndpoints is calling each time there is a cache miss
for an endpoint. It seems as if this would only impact clusters with large numbers of tokens,
so it's probably a vnodes-only issue.
> Proposal: In AbstractReplicationStrategy.getNaturalEndpoints(), cache the cloned TokenMetadata
instance returned by TokenMetadata.cloneOnlyTokenMap(), wrapping it with a lock to prevent
stampedes, and clearing it in clearEndpointCache(). Thoughts?

This message was sent by Atlassian JIRA

View raw message