incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: GC freeze just after repair session
Date Thu, 05 Jul 2012 06:56:04 GMT
We have modified maxTenuringThreshold from 1 to 5. May be it is causing
problems. Will change it back to 1 and see how the system is.

concurrent_compactors=8. We will reduce this, as anyway our system won't be
able to handle this number of compactions at the same time. Think it will
ease GC also to some extent.

Ideally we would like to collect maximum garbage from ParNew itself, during
compactions. What are the steps to take towards to achieving this?

On Wed, Jul 4, 2012 at 4:07 PM, aaron morton <aaron@thelastpickle.com>wrote:

> It *may* have been compaction from the repair, but it's not a big CF.
>
> I would look at the logs to see how much data was transferred to the node.
> Was their a compaction going on while the GC storm was happening ? Do you
> have a lot of secondary indexes ?
>
> If you think it correlated to compaction you can try reducing the
> concurrent_compactors
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 3/07/2012, at 6:33 PM, Ravikumar Govindarajan wrote:
>
> Recently, we faced a severe freeze [around 30-40 mins] on one of our
> servers. There were many mutations/reads dropped. The issue happened just
> after a routine nodetool repair for the below CF completed [1.0.7, NTS,
> DC1:3,DC2:2]
>
> Column Family: MsgIrtConv
> SSTable count: 12
> Space used (live): 17426379140
>  Space used (total): 17426379140
> Number of Keys (estimate): 122624
> Memtable Columns Count: 31180
>  Memtable Data Size: 81950175
> Memtable Switch Count: 31
> Read Count: 8074156
>  Read Latency: 15.743 ms.
> Write Count: 2172404
> Write Latency: 0.037 ms.
>  Pending Tasks: 0
> Bloom Filter False Postives: 1258
> Bloom Filter False Ratio: 0.03598
>  Bloom Filter Space Used: 498672
> Key cache capacity: 200000
> Key cache size: 200000
>  Key cache hit rate: 0.9965579513062582
> Row cache: disabled
> Compacted row minimum size: 51
>  Compacted row maximum size: 89970660
> Compacted row mean size: 226626
>
>
> Our heap config is as follows
>
> -Xms8G -Xmx8G -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=5 -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
>
> from yaml
> in_memory_compaction_limit=64
> compaction_throughput_mb_sec=8
> multi_threaded_compaction=false
>
>  INFO [AntiEntropyStage:1] 2012-06-29 09:21:26,085AntiEntropyService.java (line 762)
[repair
> #2b6fcbf0-c1f9-11e1-0000-2ea8811bfbff] MsgIrtConv is fully synced
>  INFO [AntiEntropySessions:8] 2012-06-29 09:21:26,085AntiEntropyService.java (line 698)
[repair
> #2b6fcbf0-c1f9-11e1-0000-2ea8811bfbff] session completed successfully
>  INFO [CompactionExecutor:857] 2012-06-29 09:21:31,219CompactionTask.java (line 221)
Compacted to
> [/home/sas/system/data/ZMail/MsgIrtConv-hc-858-Data.db,].  47,907,012 to
> 40,554,059 (~84% of original) bytes for 4,564 keys at 6.252080MB/s.  Time:
> 6,186ms.
>
> After this, the logs were fully filled with GC [ParNew/CMS]. ParNew ran
> for every 3 seconds, while CMS ran for every 30 seconds approx continuous
> for 40 minutes.
>
>  INFO [ScheduledTasks:1] 2012-06-29 09:23:39,921 GCInspector.java (line
> 122) GC for ParNew: 776 ms for 2 collections, 2901990208 used; max is
> 8506048512
>  INFO [ScheduledTasks:1] 2012-06-29 09:23:42,265 GCInspector.java (line
> 122) GC for ParNew: 2028 ms for 2 collections, 3831282056 used; max is
> 8506048512
>
> .........................................
>
>  INFO [ScheduledTasks:1] 2012-06-29 10:07:53,884 GCInspector.java (line
> 122) GC for ParNew: 817 ms for 2 collections, 2808685768 used; max is
> 8506048512
>  INFO [ScheduledTasks:1] 2012-06-29 10:07:55,632 GCInspector.java (line
> 122) GC for ParNew: 1165 ms for 3 collections, 3264696776 used; max is
> 8506048512
>  INFO [ScheduledTasks:1] 2012-06-29 10:07:57,773 GCInspector.java (line
> 122) GC for ParNew: 1444 ms for 3 collections, 4234372296 used; max is
> 8506048512
>  INFO [ScheduledTasks:1] 2012-06-29 10:07:59,387 GCInspector.java (line
> 122) GC for ParNew: 1153 ms for 2 collections, 4910279080 used; max is
> 8506048512
>  INFO [ScheduledTasks:1] 2012-06-29 10:08:00,389 GCInspector.java (line
> 122) GC for ParNew: 697 ms for 2 collections, 4873857072 used; max is
> 8506048512
>  INFO [ScheduledTasks:1] 2012-06-29 10:08:01,443 GCInspector.java (line
> 122) GC for ParNew: 726 ms for 2 collections, 4941511184 used; max is
> 8506048512
>
> After this, the node got stable and was back and running. Any pointers
> will be greatly helpful
>
>
>

Mime
View raw message