cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From onmstester onmstester <onmstes...@zoho.com.INVALID>
Subject Fwd: Re: High CPU usage on some of the nodes due to message coalesce
Date Sun, 21 Oct 2018 12:52:01 GMT
I don't think that root cause is related to Cassandra config, because the nodes are homogeneous
and config for all of them are the same (16GB heap with default gc), also mutation counter
and Native Transport counter is the same in all of the nodes, but only these 3 nodes experiencing
100% CPU usage (others have less than 20% CPU usage)  I even decommissioned these 3 nodes
from cluster and re-add them, but still the same The cluster is OK without these 3 nodes (in
a state that these nodes are decommissioned) Sent using Zoho Mail ============ Forwarded message
============ From : Chris Lohfink <clohfink@apple.com> To : <user@cassandra.apache.org>
Date : Sat, 20 Oct 2018 23:24:03 +0330 Subject : Re: High CPU usage on some of the nodes due
to message coalesce ============ Forwarded message ============ 1s young gcs are horrible
and likely cause of some of your bad metrics. How large are your mutations/query results and
what gc/heap settings are you using? You can use https://github.com/aragozin/jvm-tools to
see the threads generating allocation pressure and using the cpu (ttop) and what garbage is
being created (hh --dead-young). Just a shot in the dark, I would guess you have rather large
mutations putting pressure on commitlog and heap. G1 with a larger heap might help in that
scenario to reduce fragmentation and adjust its eden and survivor regions to the allocation
rate better (but give it a bigger reserve space) but theres limits to what can help if you
cant change your workload. Without more info on schema etc its hard to tell but maybe that
can help give you some ideas on places to look. It could just as likely be repair coordination,
wide partition reads, or compactions so need to look more at what within the app is causing
the pressure to know if its possible to improve with settings or if the load your application
is producing exceeds what your cluster can handle (needs more nodes). Chris On Oct 20, 2018,
at 5:18 AM, onmstester onmstester <onmstester@zoho.com.INVALID> wrote: 3 nodes in my
cluster have 100% cpu usage and most of it is used by org.apache.cassandra.util.coalesceInternal
and SepWorker.run? The most active threads are the messaging-service-incomming. Other nodes
are normal, having 30 nodes, using Rack Aware strategy. with 10 rack each having 3 nodes.
The problematic nodes are configured for one rack, on normal write load, system.log reports
too many hint message dropped (cross node). also there are alot of parNewGc with about 700-1000ms
and commit log isolated disk, is utilized about 80-90%. on startup of these 3 nodes, there
are alot of "updateing topology" logs (1000s of them pending). Using iperf, i'm sure that
network is OK checking NTPs and mutations on each node, load is balanced among the nodes.
using apache cassandra 3.11.2 I can not not figure out the root cause of the problem, although
there are some obvious symptoms. Best Regards Sent using Zoho Mail
Mime
View raw message