cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thibaut (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (CASSANDRA-2058) Nodes periodically spike in load
Date Fri, 04 Feb 2011 09:28:27 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990493#comment-12990493
] 

Thibaut edited comment on CASSANDRA-2058 at 2/4/11 9:28 AM:
------------------------------------------------------------

I'm also seeing something similar on yesterday's svn version (the one with the Consistency
level fix).

It only occurs if I enable JNA.

Nodes will experience enormous high kernel load (htop, red bar). Ssh sessions on these servers
will lag extermely. Nodes won't take 100% cpu though, but the cluster is unusable.

(Just to note: it's a completely different pattern to the 100% cpu spike which occured before,
and I can't reproduce it wihout JNA enabled)


      was (Author: tbritz):
    I'm also seeing something similar on yesterday's svn version (the one with the Consistency
level fix).

It only occurs if I enable JNA.

Nodes will experience enormous high kernel load (htop, red bar). Ssh sessions on these servers
will lag extermely. Nodes won't take 100% cpu though, but the cluster is unusable.



  
> Nodes periodically spike in load
> --------------------------------
>
>                 Key: CASSANDRA-2058
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2058
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6.10, 0.7.1
>         Environment: OpenJDK 64-Bit Server VM (build 1.6.0_0-b12, mixed mode)
> Ubuntu 8.10
> Linux pmc01 2.6.27-22-xen #1 SMP Fri Feb 20 23:58:13 UTC 2009 x86_64 GNU/Linux
>            Reporter: David King
>            Assignee: Jonathan Ellis
>             Fix For: 0.6.11, 0.7.1
>
>         Attachments: 2058-0.7-v2.txt, 2058-0.7-v3.txt, 2058-0.7.txt, 2058.txt, cassandra.pmc01.log.bz2,
cassandra.pmc14.log.bz2, graph a.png, graph b.png
>
>
> (Filing as a placeholder bug as I gather information.)
> At ~10p 24 Jan, I upgraded our 20-node cluster from 0.6.8->0.6.10, turned on the DES,
and moved some CFs from one KS into another (drain whole cluster, take it down, move files,
change schema, put it back up). Since then, I've had four storms whereby a node's load will
shoot to 700+ (400% CPU on a 4-cpu machine) and become totally unresponsive. After a moment
or two like that, its neighbour dies too, and the failure cascades around the ring. Unfortunately
because of the high load I'm not able to get into the machine to pull a thread dump to see
wtf it's doing as it happens.
> I've also had an issue where a single node spikes up to high load, but recovers. This
may or may not be the same issue from which the nodes don't recover as above, but both are
new behaviour

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message