cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <jeff.ji...@crowdstrike.com>
Subject Re: High CPU usage on some of nodes
Date Thu, 10 Sep 2015 18:56:54 GMT
With a 5s collection, the problem is almost certainly GC. 

GC pressure can be caused by a number of things, including normal read/write loads, but ALSO
compaction calculation (pre-2.1.9 / #9882) and very large partitions (trying to load a very
large partition with something like row cache in 2.0 and earlier, or issuing a full row read
where the row is larger than you expect). 

You can try to tune the GC behavior, but the underlying problem may be something like a bad
data model (which Samuel suggested), and no amount of GC tuning is going to fix trying to
do bad things with very big rows. 



From:  Roman Tkachenko
Reply-To:  "user@cassandra.apache.org"
Date:  Thursday, September 10, 2015 at 10:54 AM
To:  "user@cassandra.apache.org"
Subject:  Re: High CPU usage on some of nodes

Thanks for the responses guys. 

I also suspected GC and I guess it could be it, since during the spikes logs are filled with
messages like "GC for ConcurrentMarkSweep: 5908 ms for 1 collections, 1986282520 used; max
is 8375238656", often right before messages about dropped queries, unlike other, unaffected,
nodes that only have "GC for ParNew: 230 ms for 1 collections, 4418571760 used; max is 8375238656"
type of messages.

Is my best shot to play with JVM settings trying to tune garbage collection then?


On Thu, Sep 10, 2015 at 6:52 AM, Samuel CARRIERE <samuel.carriere@urssaf.fr> wrote:
Hi Roman, 
If it affects only a subset of nodes and it's always the same ones, it could be a "problem"
with your data model : maybe some (too) wide rows on theses nodes.
If one of your row is too wide, the deserialisation of the columns index of this row can take
a lot of resources (disk, RAM, and CPU).
If you are using leveled compaction strategy and you see anormaly big sstables on thoses nodes,
it could be a clue.
Regards, 
Samuel 

Robert Wille <rwille@fold3.com> a écrit sur 10/09/2015 15:27:41 :

> De : Robert Wille <rwille@fold3.com>
> A : "user@cassandra.apache.org" <user@cassandra.apache.org>, 
> Date : 10/09/2015 15:30 
> Objet : Re: High CPU usage on some of nodes 
> 
> It sounds like its probably GC. Grep for GC in system.log to verify.
> If it is GC, there are a myriad of issues that could cause it, but 
> at least you’ve narrowed it down.
> 
> On Sep 9, 2015, at 11:05 PM, Roman Tkachenko <roman@mailgunhq.com> wrote:
> 
> > Hey guys,
> > 
> > We've been having issues in the past couple of days with CPU usage
> / load average suddenly skyrocketing on some nodes of the cluster, 
> affecting performance significantly so majority of requests start 
> timing out. It can go on for several hours, with CPU spiking through
> the roof then coming back down to norm and so on. Weirdly, it 
> affects only a subset of nodes and it's always the same ones. The 
> boxes Cassandra is running on are pretty beefy, 24 cores, and these 
> CPU spikes go up to >1000%.
> > 
> > What is the best way to debug such kind of issues and find out 
> what Cassandra is doing during spikes like this? Doesn't seem to be 
> compaction related as sometimes during these spikes "nodetool 
> compactionstats" says no compactions are running.
> > 
> > Thanks!
> > 
> 



Mime
View raw message