Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9A523D65A for ; Mon, 27 Aug 2012 14:35:08 +0000 (UTC) Received: (qmail 36291 invoked by uid 500); 27 Aug 2012 14:35:08 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 36211 invoked by uid 500); 27 Aug 2012 14:35:07 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 36016 invoked by uid 99); 27 Aug 2012 14:35:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Aug 2012 14:35:07 +0000 Date: Tue, 28 Aug 2012 01:35:07 +1100 (NCT) From: "Jonathan Ellis (JIRA)" To: commits@cassandra.apache.org Message-ID: <71047461.1351.1346078107775.JavaMail.jiratomcat@arcas> In-Reply-To: <1806019609.558.1346042827873.JavaMail.jiratomcat@arcas> Subject: [jira] [Updated] (CASSANDRA-4578) Dead lock in mutation stage when many concurrent writes to few columns MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-4578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-4578: -------------------------------------- Priority: Minor (was: Major) Affects Version/s: (was: 1.1.3) 0.8.0 Fix Version/s: 1.1.5 Assignee: Sylvain Lebresne You're right, since CMVH grabs a writer thread until it gets replies from the other replicas, you can have two replicas deadlock with A waiting for a reply from B, and B waiting for a reply from A. One fix would be to move the local write into CMVH and the remote part into a separate stage (or maybe just a custom callback). As a workaround, use CL.ONE with counters. > Dead lock in mutation stage when many concurrent writes to few columns > ---------------------------------------------------------------------- > > Key: CASSANDRA-4578 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4578 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.8.0 > Environment: 15 cassandra instances > CentOS5 > 8 Core 64GB Memory > java version "1.6.0_33" > Java(TM) SE Runtime Environment (build 1.6.0_33-b04) > Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixed mode) > Reporter: Suguru Namura > Assignee: Sylvain Lebresne > Priority: Minor > Fix For: 1.1.5 > > Attachments: threaddump-1344957574788.tdump > > > When I send many request to increment counters to few counter columns, sometimes mutation stage cause dead lock. When it happened, all of mutation threads are locked and do not accept updates any more. > {noformat} > "MutationStage:432" - Thread t@1389 > java.lang.Thread.State: TIMED_WAITING > at java.lang.Object.wait(Native Method) > - waiting on (a org.apache.cassandra.utils.SimpleCondition) > at java.lang.Object.wait(Object.java:443) > at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:292) > at org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:54) > at org.apache.cassandra.service.AbstractWriteResponseHandler.get(AbstractWriteResponseHandler.java:55) > at org.apache.cassandra.db.CounterMutationVerbHandler.doVerb(CounterMutationVerbHandler.java:51) > at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Locked ownable synchronizers: > - locked <4b1b0a6f> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira