cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject GC freeze just after repair session
Date Tue, 03 Jul 2012 06:33:09 GMT
Recently, we faced a severe freeze [around 30-40 mins] on one of our
servers. There were many mutations/reads dropped. The issue happened just
after a routine nodetool repair for the below CF completed [1.0.7, NTS,
DC1:3,DC2:2]

Column Family: MsgIrtConv
SSTable count: 12
Space used (live): 17426379140
Space used (total): 17426379140
Number of Keys (estimate): 122624
Memtable Columns Count: 31180
Memtable Data Size: 81950175
Memtable Switch Count: 31
Read Count: 8074156
Read Latency: 15.743 ms.
Write Count: 2172404
Write Latency: 0.037 ms.
Pending Tasks: 0
Bloom Filter False Postives: 1258
Bloom Filter False Ratio: 0.03598
Bloom Filter Space Used: 498672
Key cache capacity: 200000
Key cache size: 200000
Key cache hit rate: 0.9965579513062582
Row cache: disabled
Compacted row minimum size: 51
Compacted row maximum size: 89970660
Compacted row mean size: 226626


Our heap config is as follows

-Xms8G -Xmx8G -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=5 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

from yaml
in_memory_compaction_limit=64
compaction_throughput_mb_sec=8
multi_threaded_compaction=false

 INFO [AntiEntropyStage:1] 2012-06-29 09:21:26,085 AntiEntropyService.java
(line 762) [repair #2b6fcbf0-c1f9-11e1-0000-2ea8811bfbff] MsgIrtConv is
fully synced
 INFO [AntiEntropySessions:8] 2012-06-29
09:21:26,085AntiEntropyService.java (line 698) [repair
#2b6fcbf0-c1f9-11e1-0000-2ea8811bfbff] session completed successfully
 INFO [CompactionExecutor:857] 2012-06-29 09:21:31,219 CompactionTask.java
(line 221) Compacted to
[/home/sas/system/data/ZMail/MsgIrtConv-hc-858-Data.db,].  47,907,012 to
40,554,059 (~84% of original) bytes for 4,564 keys at 6.252080MB/s.  Time:
6,186ms.

After this, the logs were fully filled with GC [ParNew/CMS]. ParNew ran for
every 3 seconds, while CMS ran for every 30 seconds approx continuous for
40 minutes.

 INFO [ScheduledTasks:1] 2012-06-29 09:23:39,921 GCInspector.java (line
122) GC for ParNew: 776 ms for 2 collections, 2901990208 used; max is
8506048512
 INFO [ScheduledTasks:1] 2012-06-29 09:23:42,265 GCInspector.java (line
122) GC for ParNew: 2028 ms for 2 collections, 3831282056 used; max is
8506048512

.........................................

 INFO [ScheduledTasks:1] 2012-06-29 10:07:53,884 GCInspector.java (line
122) GC for ParNew: 817 ms for 2 collections, 2808685768 used; max is
8506048512
 INFO [ScheduledTasks:1] 2012-06-29 10:07:55,632 GCInspector.java (line
122) GC for ParNew: 1165 ms for 3 collections, 3264696776 used; max is
8506048512
 INFO [ScheduledTasks:1] 2012-06-29 10:07:57,773 GCInspector.java (line
122) GC for ParNew: 1444 ms for 3 collections, 4234372296 used; max is
8506048512
 INFO [ScheduledTasks:1] 2012-06-29 10:07:59,387 GCInspector.java (line
122) GC for ParNew: 1153 ms for 2 collections, 4910279080 used; max is
8506048512
 INFO [ScheduledTasks:1] 2012-06-29 10:08:00,389 GCInspector.java (line
122) GC for ParNew: 697 ms for 2 collections, 4873857072 used; max is
8506048512
 INFO [ScheduledTasks:1] 2012-06-29 10:08:01,443 GCInspector.java (line
122) GC for ParNew: 726 ms for 2 collections, 4941511184 used; max is
8506048512

After this, the node got stable and was back and running. Any pointers will
be greatly helpful

Mime
View raw message