cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anup Shirolkar <anup.shirol...@instaclustr.com>
Subject Re: Odd CPU utilization spikes on 1 node out of 30 during repair
Date Wed, 26 Sep 2018 11:06:46 GMT
Hi,

Looking at information you have provided, the increased CPU utilisation
could be because of repair running on the node.
Repairs are resource intensive operations.

Restarting the node should have halted repair operation getting the CPU
back to normal.

In case you regularly run repairs but have observed increase in CPU
utilisation first time,
it could be area of concern. Otherwise, repairs utilising extra CPU is
normal.

In some cases, repairs trigger additional operations e.g. compactions,
anti-compactions
These operations could cause extra CPU utilisation.
What is the compaction strategy used on majority of keyspaces ?

Talking about CPU utilisation *percentage*, although it has doubled but the
increase is 15%.
It would be interesting to know the number of CPU cores on these nodes to
judge the absolute increase in CPU utilisation.

You should try to find the root cause behind the behaviour and decide
course of action.
Effective use monitoring, logs can help you identify the root cause.

Regards,
Anup

On Wed, 26 Sep 2018 at 17:34, Oleksandr Shulgin <
oleksandr.shulgin@zalando.de> wrote:

> Hello,
>
> On our production cluster of 30 Apache Cassandra 3.0.17 nodes we have
> observed that only one node started to show about 2 times the CPU
> utilization as compared to the rest (see screenshot): up to 30% vs. ~15% on
> average for the other nodes.
>
> This started more or less immediately after repair was started (using
> Cassandra Reaper, parallel, non-incremental) and lasted up until we've
> restarted this node.  After restart the CPU use is in line with the rest of
> nodes.
>
> All other metrics that we are monitoring for these nodes were in line with
> the rest of the cluster.
>
> The logs on the node don't show anything odd, no extra warn/error/info
> messages, not more minor or major GC runs as compared to other nodes during
> the time we were observing this behavior.
>
> What could be the reason for this behavior?  How should we debug it if
> that happens next time instead of just restarting?
>
> Cheers,
> --
> Alex
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org



-- 

Anup Shirolkar

Consultant

+61 420 602 338

<https://www.instaclustr.com/solutions/managed-apache-kafka/>

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

Mime
View raw message