cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anup Shirolkar <>
Subject Re: Odd CPU utilization spikes on 1 node out of 30 during repair
Date Thu, 27 Sep 2018 00:24:08 GMT

Most of the things look ok from your setup.

You can enable Debug logs for repair duration.
This will help identify if you are hitting a bug or other cause of unusual

Just a remote possibility, do you have other things running on nodes
besides Cassandra.
Do they consume additional CPU at times.
You can check per process CPU consumption to keep an eye on non-Cassandra


Anup Shirolkar

On Wed, 26 Sep 2018 at 21:32, Oleksandr Shulgin <> wrote:

> On Wed, Sep 26, 2018 at 1:07 PM Anup Shirolkar <
>> wrote:
>> Looking at information you have provided, the increased CPU utilisation
>> could be because of repair running on the node.
>> Repairs are resource intensive operations.
>> Restarting the node should have halted repair operation getting the CPU
>> back to normal.
> The repair was running on all nodes at the same time, still only one node
> had CPU significantly different from the rest of the nodes.
> As I've mentioned: we are running non-incremental parallel repair using
> Cassandra Reaper.
> After the node was restarted, new repair tasks were given to it by Reaper
> and it was doing repair as previously, but this time
> without exposing the odd behavior.
> In some cases, repairs trigger additional operations e.g. compactions,
>> anti-compactions
>> These operations could cause extra CPU utilisation.
>> What is the compaction strategy used on majority of keyspaces ?
> For the 2 tables involved in this regular repair we are using
> TimeWindowCompactionStrategy with time windows of 30 days.
> Talking about CPU utilisation *percentage*, although it has doubled but
>> the increase is 15%.
>> It would be interesting to know the number of CPU cores on these nodes to
>> judge the absolute increase in CPU utilisation.
> All nodes are using the same hardware on AWS EC2: r4.xlarge, they have 4
> vCPUs.
> You should try to find the root cause behind the behaviour and decide
>> course of action.
> Sure, that's why I was asking for ideas how to find the root cause. :-)
> Effective use monitoring, logs can help you identify the root cause.
> As I've mentioned, we do have monitoring and I've checked the logs, but
> that didn't help to identify the issue so far.
> Regards,
> --
> Alex

View raw message