Return-Path: Delivered-To: apmail-incubator-cassandra-commits-archive@minotaur.apache.org Received: (qmail 98756 invoked from network); 8 Feb 2010 18:49:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Feb 2010 18:49:53 -0000 Received: (qmail 60436 invoked by uid 500); 8 Feb 2010 18:49:53 -0000 Delivered-To: apmail-incubator-cassandra-commits-archive@incubator.apache.org Received: (qmail 60421 invoked by uid 500); 8 Feb 2010 18:49:53 -0000 Mailing-List: contact cassandra-commits-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-dev@incubator.apache.org Delivered-To: mailing list cassandra-commits@incubator.apache.org Received: (qmail 60411 invoked by uid 99); 8 Feb 2010 18:49:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Feb 2010 18:49:53 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Feb 2010 18:49:52 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 21C21234C4A8 for ; Mon, 8 Feb 2010 10:49:32 -0800 (PST) Message-ID: <1386380679.126381265654972136.JavaMail.jira@brutus.apache.org> Date: Mon, 8 Feb 2010 18:49:32 +0000 (UTC) From: "Gary Dusbabek (JIRA)" To: cassandra-commits@incubator.apache.org Subject: [jira] Updated: (CASSANDRA-778) Gossiper thread deadlock In-Reply-To: <1610845039.126361265654972108.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Dusbabek updated CASSANDRA-778: ------------------------------------ Attachment: 0001-fix-deadlock.patch > Gossiper thread deadlock > ------------------------ > > Key: CASSANDRA-778 > URL: https://issues.apache.org/jira/browse/CASSANDRA-778 > Project: Cassandra > Issue Type: Bug > Affects Versions: 0.6 > Reporter: Gary Dusbabek > Assignee: Gary Dusbabek > Fix For: 0.6 > > Attachments: 0001-fix-deadlock.patch > > > Found this while attempting to bootstrap a node with more than a trivial amount of data: > Found one Java-level deadlock: > ============================= > "GMFD:1": > waiting to lock monitor 0x0000000100861d60 (object 0x00000001066a7ed8, a org.apache.cassandra.service.StorageService), > which is held by "main" > "main": > waiting to lock monitor 0x0000000100860710 (object 0x0000000106c7c968, a org.apache.cassandra.gms.Gossiper), > which is held by "GMFD:1" > Java stack information for the threads listed above: > =================================================== > "GMFD:1": > at org.apache.cassandra.service.StorageService.getReplicationStrategy(StorageService.java:226) > - waiting to lock <0x00000001066a7ed8> (a org.apache.cassandra.service.StorageService) > at org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:634) > at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:502) > at org.apache.cassandra.service.StorageService.onChange(StorageService.java:445) > at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:812) > at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:607) > at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:582) > at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:649) > - locked <0x0000000106c7c968> (a org.apache.cassandra.gms.Gossiper) > at org.apache.cassandra.gms.Gossiper$GossipDigestAck2VerbHandler.doVerb(Gossiper.java:1061) > at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:637) > "main": > at org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:861) > - waiting to lock <0x0000000106c7c968> (a org.apache.cassandra.gms.Gossiper) > at org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:347) > at org.apache.cassandra.service.StorageService.initServer(StorageService.java:318) > - locked <0x00000001066a7ed8> (a org.apache.cassandra.service.StorageService) > at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:99) > at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:174) > Found 1 deadlock. > main acquires SS lock and doesn't release it before attempting to acquire the Gossiper lock. Meanwhile, the gossip stage acquires the Gossiper lock and then attempts to acquire the SS lock. > Solution is to have finer-grained locking on the resource in SS (map of replication strategies), or to move the collection to a different class (DD maybe?). This was introduced in CASSANDRA-620. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.