incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: GC freeze just after repair session
Date Thu, 05 Jul 2012 22:33:38 GMT
> Ideally we would like to collect maximum garbage from ParNew itself, during compactions.
What are the steps to take towards to achieving this?
I'm not sure what you are asking. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/07/2012, at 6:56 PM, Ravikumar Govindarajan wrote:

> We have modified maxTenuringThreshold from 1 to 5. May be it is causing problems. Will
change it back to 1 and see how the system is.
> 
> concurrent_compactors=8. We will reduce this, as anyway our system won't be able to handle
this number of compactions at the same time. Think it will ease GC also to some extent.
> 
> Ideally we would like to collect maximum garbage from ParNew itself, during compactions.
What are the steps to take towards to achieving this?
> 
> On Wed, Jul 4, 2012 at 4:07 PM, aaron morton <aaron@thelastpickle.com> wrote:
> It *may* have been compaction from the repair, but it's not a big CF.
> 
> I would look at the logs to see how much data was transferred to the node. Was their
a compaction going on while the GC storm was happening ? Do you have a lot of secondary indexes
? 
> 
> If you think it correlated to compaction you can try reducing the concurrent_compactors

> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 3/07/2012, at 6:33 PM, Ravikumar Govindarajan wrote:
> 
>> Recently, we faced a severe freeze [around 30-40 mins] on one of our servers. There
were many mutations/reads dropped. The issue happened just after a routine nodetool repair
for the below CF completed [1.0.7, NTS, DC1:3,DC2:2]
>> 
>> 		Column Family: MsgIrtConv
>> 		SSTable count: 12
>> 		Space used (live): 17426379140
>> 		Space used (total): 17426379140
>> 		Number of Keys (estimate): 122624
>> 		Memtable Columns Count: 31180
>> 		Memtable Data Size: 81950175
>> 		Memtable Switch Count: 31
>> 		Read Count: 8074156
>> 		Read Latency: 15.743 ms.
>> 		Write Count: 2172404
>> 		Write Latency: 0.037 ms.
>> 		Pending Tasks: 0
>> 		Bloom Filter False Postives: 1258
>> 		Bloom Filter False Ratio: 0.03598
>> 		Bloom Filter Space Used: 498672
>> 		Key cache capacity: 200000
>> 		Key cache size: 200000
>> 		Key cache hit rate: 0.9965579513062582
>> 		Row cache: disabled
>> 		Compacted row minimum size: 51
>> 		Compacted row maximum size: 89970660
>> 		Compacted row mean size: 226626
>> 
>> 
>> Our heap config is as follows
>> 
>> -Xms8G -Xmx8G -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=5 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
>> 
>> from yaml
>> in_memory_compaction_limit=64
>> compaction_throughput_mb_sec=8
>> multi_threaded_compaction=false
>> 
>>  INFO [AntiEntropyStage:1] 2012-06-29 09:21:26,085 AntiEntropyService.java (line
762) [repair #2b6fcbf0-c1f9-11e1-0000-2ea8811bfbff] MsgIrtConv is fully synced
>>  INFO [AntiEntropySessions:8] 2012-06-29 09:21:26,085 AntiEntropyService.java (line
698) [repair #2b6fcbf0-c1f9-11e1-0000-2ea8811bfbff] session completed successfully
>>  INFO [CompactionExecutor:857] 2012-06-29 09:21:31,219 CompactionTask.java (line
221) Compacted to [/home/sas/system/data/ZMail/MsgIrtConv-hc-858-Data.db,].  47,907,012 to
40,554,059 (~84% of original) bytes for 4,564 keys at 6.252080MB/s.  Time: 6,186ms.
>> 
>> After this, the logs were fully filled with GC [ParNew/CMS]. ParNew ran for every
3 seconds, while CMS ran for every 30 seconds approx continuous for 40 minutes.
>> 
>>  INFO [ScheduledTasks:1] 2012-06-29 09:23:39,921 GCInspector.java (line 122) GC for
ParNew: 776 ms for 2 collections, 2901990208 used; max is 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 09:23:42,265 GCInspector.java (line 122) GC for
ParNew: 2028 ms for 2 collections, 3831282056 used; max is 8506048512
>> 
>> .........................................
>> 
>>  INFO [ScheduledTasks:1] 2012-06-29 10:07:53,884 GCInspector.java (line 122) GC for
ParNew: 817 ms for 2 collections, 2808685768 used; max is 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 10:07:55,632 GCInspector.java (line 122) GC for
ParNew: 1165 ms for 3 collections, 3264696776 used; max is 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 10:07:57,773 GCInspector.java (line 122) GC for
ParNew: 1444 ms for 3 collections, 4234372296 used; max is 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 10:07:59,387 GCInspector.java (line 122) GC for
ParNew: 1153 ms for 2 collections, 4910279080 used; max is 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 10:08:00,389 GCInspector.java (line 122) GC for
ParNew: 697 ms for 2 collections, 4873857072 used; max is 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 10:08:01,443 GCInspector.java (line 122) GC for
ParNew: 726 ms for 2 collections, 4941511184 used; max is 8506048512
>> 
>> After this, the node got stable and was back and running. Any pointers will be greatly
helpful
> 
> 


Mime
View raw message