cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Ramirez <erick.rami...@datastax.com>
Subject Re: Hints replays very slow in one DC
Date Thu, 27 Feb 2020 02:54:59 GMT
>
> Nodes are going down due to Out of Memory and we are using 31GB heap size
> in DC1 , however in DC2 (Which serves the traffic) has 16GB heap .
> Why we had to increase heap in DC1 is because , DC1 nodes were going down
> due Out of Memory issue but DC2 nodes never went down .
>

It doesn't sound right that the primary DC is DC2 but DC1 is under load.
You might not be aware of it but the symptom suggests DC1 is getting hit
with lots of traffic. If you run netstat (or whatever utility/tool of your
choice), you should see established connections to the cluster. That should
give you clues as to where it's coming from.


> We also noticed below kind of messages in system.log
> FailureDetector.java:288 - Not marking nodes down due to local pause of
> 9532654114 > 5000000000
>

That's another smoking gun that the nodes are buried in GC. A 9.5-second
pause is significant. The slow hinted handoffs is really the least of your
problem right now. If nodes weren't going down, there wouldn't be hints to
handoff in the first place. Cheers!

GOT QUESTIONS? Apache Cassandra experts from the community and DataStax have
answers! Share your expertise on https://community.datastax.com/.

Mime
View raw message