cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6345) Endpoint cache invalidation causes CPU spike (on vnode rings?)
Date Wed, 20 Nov 2013 04:27:26 GMT


Jonathan Ellis commented on CASSANDRA-6345:

bq. It seems as if just setting the cache to empty would allow a period of time where TokenMetadata
write methods had returned but not all threads have seen the mutation yet

I'm not 100% sure this is what you're talking about, but I see this problem with the existing
code (and my v3):

Thread 1                 Thread 2        
endpoints = calculate
cacheEndpoint [based on the now-invalidated token map]

So it doesn't quite work.  We'd need to introduce another AtomicReference on the cache, so
that invalidate could create a new Map (so it doesn't matter if someone updates the old one).
 But I think you're right that getting rid of the callback approach entirely is better.

> Endpoint cache invalidation causes CPU spike (on vnode rings?)
> --------------------------------------------------------------
>                 Key: CASSANDRA-6345
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: 30 nodes total, 2 DCs
> Cassandra 1.2.11
> vnodes enabled (256 per node)
>            Reporter: Rick Branson
>            Assignee: Jonathan Ellis
>             Fix For: 1.2.12, 2.0.3
>         Attachments: 6345-rbranson-v2.txt, 6345-rbranson.txt, 6345-v2.txt, 6345-v3.txt,
6345.txt, half-way-thru-6345-rbranson-patch-applied.png
> We've observed that events which cause invalidation of the endpoint cache (update keyspace,
add/remove nodes, etc) in AbstractReplicationStrategy result in several seconds of thundering
herd behavior on the entire cluster. 
> A thread dump shows over a hundred threads (I stopped counting at that point) with a
backtrace like this:
>         at
>         at org.apache.cassandra.locator.TokenMetadata$
>         at org.apache.cassandra.locator.TokenMetadata$
>         at java.util.TreeMap.getEntryUsingComparator(
>         at java.util.TreeMap.getEntry(
>         at java.util.TreeMap.get(
>         at
>         at
>         at
>         at
>         at
>         at org.apache.cassandra.utils.SortedBiMultiValMap.create(
>         at org.apache.cassandra.locator.TokenMetadata.cloneOnlyTokenMap(
>         at org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(
>         at org.apache.cassandra.service.StorageService.getNaturalEndpoints(
>         at org.apache.cassandra.service.StorageProxy.performWrite(
> It looks like there's a large amount of cost in the TokenMetadata.cloneOnlyTokenMap that
AbstractReplicationStrategy.getNaturalEndpoints is calling each time there is a cache miss
for an endpoint. It seems as if this would only impact clusters with large numbers of tokens,
so it's probably a vnodes-only issue.
> Proposal: In AbstractReplicationStrategy.getNaturalEndpoints(), cache the cloned TokenMetadata
instance returned by TokenMetadata.cloneOnlyTokenMap(), wrapping it with a lock to prevent
stampedes, and clearing it in clearEndpointCache(). Thoughts?

This message was sent by Atlassian JIRA

View raw message