cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David King (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-2058) Nodes periodically spike in load
Date Wed, 26 Jan 2011 06:49:47 GMT


David King commented on CASSANDRA-2058:

bq. You were running 0.6.8 + DS before? Or is "it" not DynamicSnitch?

I was running 0.6.8 with no DES. Then I upgraded to 0.6.10 and turned it on. I had the aforementioned

Now I'm running 0.6.10 with the DES turned off. (As of this writing, I'm still seeing the
momentary spikes but thus far no sustained ones.)

If I continue to have the momentary or sustained spikes (I'll probably know by the morning),
then I'll revert to 0.6.8, and turn *on* the DES.

If after that I continue to have problems I'll revert back to 0.6.8 with no DES, which is
at least a configuration in which I didn't have any of these problems

> Nodes periodically spike in load
> --------------------------------
>                 Key: CASSANDRA-2058
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6.10
>            Reporter: David King
>         Attachments: cassandra.pmc01.log.bz2, cassandra.pmc14.log.bz2, graph a.png, graph
> (Filing as a placeholder bug as I gather information.)
> At ~10p 24 Jan, I upgraded our 20-node cluster from 0.6.8->0.6.10, turned on the DES,
and moved some CFs from one KS into another (drain whole cluster, take it down, move files,
change schema, put it back up). Since then, I've had four storms whereby a node's load will
shoot to 700+ (400% CPU on a 4-cpu machine) and become totally unresponsive. After a moment
or two like that, its neighbour dies too, and the failure cascades around the ring. Unfortunately
because of the high load I'm not able to get into the machine to pull a thread dump to see
wtf it's doing as it happens.
> I've also had an issue where a single node spikes up to high load, but recovers. This
may or may not be the same issue from which the nodes don't recover as above, but both are
new behaviour

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message