incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Baron <Chris.Ba...@ip-soft.net>
Subject Hinted Handoff/GC Tuning Headache
Date Wed, 16 Feb 2011 04:25:32 GMT
Recently upgraded my 8 node cluster from 0.6.6 to 0.7.0 (even more recently 0.7.1) for ExpiringColumn,
among the many other spectacular improvements.
 
Retuned the GC settings based on experience from 0.6.6 and new defaults.
 
After about a week, two of the nodes were very far behind on minor compactions (2k+ SSTables
per CF and growing, 20k+ pending compactions).  The SSTable switch rate on these two nodes
was about 10x higher than the other nodes.  I also observed rolling long pause deaths (Gossip
saying node X is dead), seemingly every three minutes one of the nodes would long pause GC.
 I saw this behavior also when I upgraded from 0.6.6 to 0.6.8, but I rolled back to 0.6.6
because time did not allow for a deeper observation at that time. (found this: https://issues.apache.org/jira/browse/CASSANDRA-1656)
 
I eventually traced this behavior back to a nasty interaction between Hinted Handoff and GC
tuned for normal operating conditions.  
 
If I understand the code correctly, when a node replays a hint it reads the hinted data directly
from the application tables (read: my ColumnFamily).  If the replaying node happens to be
to also be a replica it will resend the entire row, even if only one column was mutated. 
Because of the rolling GC pause deaths the HHs rarely succeeded and if they did it wasn’t
long before a new set of hints were recorded.
 
Disabling Hinted Handoffs has fixed this problem, for me.
 
Looking into intermittent GC issues further, the verbose gc log showed ParNew promotion failures,
so I conservatively lowered CMSInitiatingOccupancyFraction, MAX_NEWSIZE, and in_memory_compaction_limit_in_mb.
 I’m now seeing long CMS times (8000ms+) but no failures, which leads me to believe 6G heap
may be too large based on the current tuning.
 
It’s worth noting that I saw no increase in ColumnFamily WriteCount or StorageProxy.WriteOperations,
only ColumnFamily MemtableColumnsCount and MemtableDataSize were increasing very rapidly on
the target node while HintedHandoffs were replaying.

--
Chris
Mime
View raw message