cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thibaut (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-2054) Cpu Spike to > 100%.
Date Wed, 26 Jan 2011 15:33:44 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987055#action_12987055
] 

Thibaut commented on CASSANDRA-2054:
------------------------------------

The jstack was taken after the node returned to "normal" state. I'm unable to do a jstack
if the node is taking away all the cpu.

The connections could also all come from the threads from my application (multiple threads)
trying to connect while the node is in the "not normal" state, as the node is still marked
as up.


> Cpu Spike to > 100%. 
> ---------------------
>
>                 Key: CASSANDRA-2054
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2054
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Thibaut
>         Attachments: gc.log, jstack.txt, jstackerror.txt
>
>
> I see sudden spikes of cpu usage where cassandra will take up an enormous amount of cpu
(uptime load > 1000). 
> My application executes both reads and writes.
> I tested this with https://hudson.apache.org/hudson/job/Cassandra-0.7/193/artifact/cassandra/build/apache-cassandra-2011-01-24_06-01-26-bin.tar.gz.
> I disabled JNA, but this didn't help.
> Jstack won't work anymore when this happens:
> -bash-4.1# jstack 27699 > /tmp/jstackerror
> 27699: Unable to open socket file: target process not responding or HotSpot VM not loaded
> The -F option can be used when the target process is not responding
> Also, my entire application comes to a halt as long as the node is in this state, as
the node is still marked as up, but won't respond (cassandra is taking up all the cpu on the
first node) to any requests.
> /software/cassandra/bin/nodetool -h localhost ring
> Address Status State Load Owns Token
> ffffffffffffffff
> 192.168.0.1 Up Normal 3.48 GB 5.00% 0cc
> 192.168.0.2 Up Normal 3.48 GB 5.00% 199
> 192.168.0.3 Up Normal 3.67 GB 5.00% 266
> 192.168.0.4 Up Normal 2.55 GB 5.00% 333
> 192.168.0.5 Up Normal 2.58 GB 5.00% 400
> 192.168.0.6 Up Normal 2.54 GB 5.00% 4cc
> 192.168.0.7 Up Normal 2.59 GB 5.00% 599
> 192.168.0.8 Up Normal 2.58 GB 5.00% 666
> 192.168.0.9 Up Normal 2.33 GB 5.00% 733
> 192.168.0.10 Down Normal 2.39 GB 5.00% 7ff
> 192.168.0.11 Up Normal 2.4 GB 5.00% 8cc
> 192.168.0.12 Up Normal 2.74 GB 5.00% 999
> 192.168.0.13 Up Normal 3.17 GB 5.00% a66
> 192.168.0.14 Up Normal 3.25 GB 5.00% b33
> 192.168.0.15 Up Normal 3.01 GB 5.00% c00
> 192.168.0.16 Up Normal 2.48 GB 5.00% ccc
> 192.168.0.17 Up Normal 2.41 GB 5.00% d99
> 192.168.0.18 Up Normal 2.3 GB 5.00% e66
> 192.168.0.19 Up Normal 2.27 GB 5.00% f33
> 192.168.0.20 Up Normal 2.32 GB 5.00% ffffffffffffffff
> The interesting part is that after a while (seconds or minutes), I have seen cassandra
nodes return to a normal state again (without restart). I have also never seen this happen
at 2 nodes at the same time in the cluster (the node where it happens differes, but there
seems to be scheme for it to happen on the first node most of the times).
> In the above case, I restarted node 192.168.0.10 and the first node returned to normal
state. (I don't know if there is a correlation)
> I attached the jstack of the node in trouble (as soon as I could access it with jstack,
but I suspect this is the jstack when the node was running normal again).
> The heap usage is still moderate:
> /software/cassandra/bin/nodetool -h localhost info
> 0cc
> Gossip active    : true
> Load             : 3.49 GB
> Generation No    : 1295949691
> Uptime (seconds) : 42843
> Heap Memory (MB) : 1570.58 / 3005.38
> I will enable the GC logging tomorrow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message