Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ECA0C10A29 for ; Mon, 18 Nov 2013 18:53:24 +0000 (UTC) Received: (qmail 38937 invoked by uid 500); 18 Nov 2013 18:53:23 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 38906 invoked by uid 500); 18 Nov 2013 18:53:23 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 38892 invoked by uid 99); 18 Nov 2013 18:53:23 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Nov 2013 18:53:23 +0000 Date: Mon, 18 Nov 2013 18:53:23 +0000 (UTC) From: "Rick Branson (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-6345) Endpoint cache invalidation causes CPU spike (on vnode rings?) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825614#comment-13825614 ] Rick Branson commented on CASSANDRA-6345: ----------------------------------------- I like the simpler approach. I still think the callbacks for invalidation are asking for it ;) I also think perhaps the stampede lock should be more explicit than a synchronized lock on "this" to prevent unintended blocking from future modifications. Either way, I think the only material concern I have is the order that TokenMetadata changes get applied to the caches in AbstractReplicationStrategy instances. Shouldn't the invalidation take place on all threads in all instances of AbstractReplicationStrategy before returning from an endpoint-mutating write operation in TokenMetadata? It seems as if just setting the cache to empty would allow a period of time where TokenMetadata write methods had returned but not all threads have seen the mutation yet because they are still holding onto the old clone of TM. This might be alright though, I'm not sure. Thoughts? > Endpoint cache invalidation causes CPU spike (on vnode rings?) > -------------------------------------------------------------- > > Key: CASSANDRA-6345 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6345 > Project: Cassandra > Issue Type: Bug > Environment: 30 nodes total, 2 DCs > Cassandra 1.2.11 > vnodes enabled (256 per node) > Reporter: Rick Branson > Assignee: Jonathan Ellis > Fix For: 1.2.12, 2.0.3 > > Attachments: 6345-rbranson-v2.txt, 6345-rbranson.txt, 6345-v2.txt, 6345-v3.txt, 6345.txt, half-way-thru-6345-rbranson-patch-applied.png > > > We've observed that events which cause invalidation of the endpoint cache (update keyspace, add/remove nodes, etc) in AbstractReplicationStrategy result in several seconds of thundering herd behavior on the entire cluster. > A thread dump shows over a hundred threads (I stopped counting at that point) with a backtrace like this: > at java.net.Inet4Address.getAddress(Inet4Address.java:288) > at org.apache.cassandra.locator.TokenMetadata$1.compare(TokenMetadata.java:106) > at org.apache.cassandra.locator.TokenMetadata$1.compare(TokenMetadata.java:103) > at java.util.TreeMap.getEntryUsingComparator(TreeMap.java:351) > at java.util.TreeMap.getEntry(TreeMap.java:322) > at java.util.TreeMap.get(TreeMap.java:255) > at com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:200) > at com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:117) > at com.google.common.collect.TreeMultimap.put(TreeMultimap.java:74) > at com.google.common.collect.AbstractMultimap.putAll(AbstractMultimap.java:273) > at com.google.common.collect.TreeMultimap.putAll(TreeMultimap.java:74) > at org.apache.cassandra.utils.SortedBiMultiValMap.create(SortedBiMultiValMap.java:60) > at org.apache.cassandra.locator.TokenMetadata.cloneOnlyTokenMap(TokenMetadata.java:598) > at org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:104) > at org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:2671) > at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:375) > It looks like there's a large amount of cost in the TokenMetadata.cloneOnlyTokenMap that AbstractReplicationStrategy.getNaturalEndpoints is calling each time there is a cache miss for an endpoint. It seems as if this would only impact clusters with large numbers of tokens, so it's probably a vnodes-only issue. > Proposal: In AbstractReplicationStrategy.getNaturalEndpoints(), cache the cloned TokenMetadata instance returned by TokenMetadata.cloneOnlyTokenMap(), wrapping it with a lock to prevent stampedes, and clearing it in clearEndpointCache(). Thoughts? -- This message was sent by Atlassian JIRA (v6.1#6144)