Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 5537 invoked from network); 15 Feb 2011 18:12:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Feb 2011 18:12:31 -0000 Received: (qmail 78754 invoked by uid 500); 15 Feb 2011 18:12:30 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 77984 invoked by uid 500); 15 Feb 2011 18:12:28 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 77966 invoked by uid 99); 15 Feb 2011 18:12:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Feb 2011 18:12:26 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Feb 2011 18:12:25 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 438B41A6919 for ; Tue, 15 Feb 2011 18:12:04 +0000 (UTC) Date: Tue, 15 Feb 2011 18:12:04 +0000 (UTC) From: "Brandon Williams (JIRA)" To: commits@cassandra.apache.org Message-ID: <1932419682.18153.1297793524272.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1836276.205521296004843617.JavaMail.jira@thor> Subject: [jira] Updated: (CASSANDRA-2058) Nodes periodically spike in load MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-2058: ---------------------------------------- Fix Version/s: (was: 0.7.1) 0.7.2 > Nodes periodically spike in load > -------------------------------- > > Key: CASSANDRA-2058 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2058 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.6.10, 0.7.1 > Environment: OpenJDK 64-Bit Server VM (build 1.6.0_0-b12, mixed mode) > Ubuntu 8.10 > Linux pmc01 2.6.27-22-xen #1 SMP Fri Feb 20 23:58:13 UTC 2009 x86_64 GNU/Linux > Reporter: David King > Assignee: Jonathan Ellis > Fix For: 0.6.11, 0.7.2 > > Attachments: 2058-0.7-v2.txt, 2058-0.7-v3.txt, 2058-0.7.txt, 2058.txt, cassandra.pmc01.log.bz2, cassandra.pmc14.log.bz2, graph a.png, graph b.png > > > (Filing as a placeholder bug as I gather information.) > At ~10p 24 Jan, I upgraded our 20-node cluster from 0.6.8->0.6.10, turned on the DES, and moved some CFs from one KS into another (drain whole cluster, take it down, move files, change schema, put it back up). Since then, I've had four storms whereby a node's load will shoot to 700+ (400% CPU on a 4-cpu machine) and become totally unresponsive. After a moment or two like that, its neighbour dies too, and the failure cascades around the ring. Unfortunately because of the high load I'm not able to get into the machine to pull a thread dump to see wtf it's doing as it happens. > I've also had an issue where a single node spikes up to high load, but recovers. This may or may not be the same issue from which the nodes don't recover as above, but both are new behaviour -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira