My first guess is the 3 ParNew collections taking 35 seconds is why your clients had long response times. That is a very abnormal stat. Combined with the low memory usage at that time, it seems highly unlikely there was actually 35 seconds worth of GCing going on there. Unless there is anything interesting in the lines above that snippet.

I'd first double check that your new generation hasn't been accidentally set to something insanely large or small, just to rule that out (but judging from the 4G of heap that is set you are most likely using defaults, which is good). After that I'd try replacing that node as it could be a hardware problem.

Another reason I have seen large pauses on a linux VPS that might be worth looking into is low entropy (possibly due to someone else abusing your shared hardware).


On Wed, Jan 30, 2013 at 3:44 PM, Guillermo Barbero <guillermo.barbero@spotbros.com> wrote:
Guys,

First thanks a lot for your answers.. now I have a little more info, lets see if this helps:

As I said before we are using Cassandra 1.1.7 on a 6 nodes(Ips: 150-155) (Amazon EC2 XL instances 17GB RAM) and the problem is that in the last weeks we are seing that the performace of the cluster falls.. PHP commands that usually lasted a few milliseconds last up to 15 seconds for just 2/3 minutes every now and then (no specific pattern) (see commands timing in attached picture).

Today we have found this:

Prior to a fall in performance on all Cassandra nodes log I can see this:

Node 150:

INFO [GossipTasks:1] 2013-01-30 21:35:23,514 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
 INFO [GossipStage:1] 2013-01-30 21:35:40,666 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
 INFO [HintedHandoff:1] 2013-01-30 21:35:40,667 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:35:41,266 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live bytes, 298 ops)
 INFO [FlushWriter:2879] 2013-01-30 21:35:41,267 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live bytes, 298 ops)
 INFO [FlushWriter:2879] 2013-01-30 21:35:41,282 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db (84559 bytes) for commitlog position ReplayPosition(segmentId=1355151995930, position=32659991)
 INFO [CompactionExecutor:9098] 2013-01-30 21:35:41,283 CompactionTask.java (line 109) Compacting [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-275-Data.db'), SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db')]

Node 151:

 INFO [GossipTasks:1] 2013-01-30 21:35:25,689 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
 INFO [GossipStage:1] 2013-01-30 21:35:40,677 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
 INFO [HintedHandoff:1] 2013-01-30 21:35:40,677 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:35:41,548 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live bytes, 316 ops)
 INFO [FlushWriter:2636] 2013-01-30 21:35:41,549 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live bytes, 316 ops)
 INFO [FlushWriter:2636] 2013-01-30 21:35:41,563 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-266-Data.db (93711 bytes) for commitlog position ReplayPosition(segmentId=1354832000013, position=20033362)

Node 153:

 INFO [GossipTasks:1] 2013-01-30 21:35:24,822 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
 INFO [ScheduledTasks:1] 2013-01-30 21:35:24,896 GCInspector.java (line 122) GC for ParNew: 294 ms for 1 collections, 1539602448 used; max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:35:26,897 GCInspector.java (line 122) GC for ParNew: 202 ms for 1 collections, 1829975768 used; max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:35:27,898 GCInspector.java (line 122) GC for ParNew: 234 ms for 1 collections, 2037767104 used; max is 3886022656
 INFO [MemoryMeter:1] 2013-01-30 21:35:31,151 Memtable.java (line 213) CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 1.4195889775077013 (just-counted was 1.4195889775077013).  calculation took 109ms for 64 columns
 INFO [MemoryMeter:1] 2013-01-30 21:35:37,806 Memtable.java (line 213) CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 1.5276925191259334 (just-counted was 1.5276925191259334).  calculation took 11ms for 129 columns
 INFO [GossipStage:1] 2013-01-30 21:35:40,735 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
 INFO [HintedHandoff:1] 2013-01-30 21:35:40,742 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [MemoryMeter:1] 2013-01-30 21:35:43,184 Memtable.java (line 213) CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 1.561312033901826 (just-counted was 1.561312033901826).  calculation took 234ms for 144 columns
 INFO [HintedHandoff:1] 2013-01-30 21:35:43,549 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1120216043(96278/187900 serialized/live bytes, 288 ops)

Node 154:

INFO [HintedHandoff:1] 2013-01-30 21:35:00,729 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.0.0.153
 INFO [GossipTasks:1] 2013-01-30 21:35:26,432 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
 INFO [GossipStage:1] 2013-01-30 21:35:40,669 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
 INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:35:41,208 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live bytes, 286 ops)
 INFO [FlushWriter:2496] 2013-01-30 21:35:41,209 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live bytes, 286 ops)


Node 155:

INFO [GossipTasks:1] 2013-01-30 21:35:27,117 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
 INFO [GossipStage:1] 2013-01-30 21:35:40,668 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
 INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:35:41,279 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1083199446(94578/190654 serialized/live bytes, 316 ops)

Meanwhile on node 152:

INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-4175-Data.db (639626 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=11032499)
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,701 GCInspector.java (line 122) GC for ParNew: 35087 ms for 3 collections, 442940272 used; max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,934 StatusLogger.java (line 57) Pool Name                    Active   Pending   Blocked
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,941 StatusLogger.java (line 72) ReadStage                        32       343         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) RequestResponseStage              0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) ReadRepairStage                   0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) MutationStage                     0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) ReplicateOnWriteStage             0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) GossipStage                       0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) AntiEntropyStage                  0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line 72) MigrationStage                    0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line 72) StreamStage                       0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) MemtablePostFlusher               0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) FlushWriter                       0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) MiscStage                         0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) commitlog_archiver                0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) InternalResponseStage             0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) HintedHandoff                     0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 77) CompactionManager                 0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 89) MessagingService                n/a      0,24
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 99) Cache Type                     Size                 Capacity               KeysToSave                                                         Provider
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 100) KeyCache                     133569                  2184533                      all
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 106) RowCache                          0                        0                      all              org.apache.cassandra.cache.SerializingCacheProvider
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 113) ColumnFamily                Memtable ops,data
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 116) system.NodeIdInfo                         0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.IndexInfo                          0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.LocationInfo                       0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.Versions                         3,103
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.schema_keyspaces                   0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,950 StatusLogger.java (line 116) system.Migrations                         0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,059 StatusLogger.java (line 116) system.schema_columnfamilies                 0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.schema_columns                     0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.HintsColumnFamily               1,2369
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.Schema                             0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_mailPipes            9241,30380203
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_profilePics            427,1173642
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_SBVars                         0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_SBCounters                   5,232
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,092 StatusLogger.java (line 116) SB_NNCD.CF_userPhoneAgenda        7919,2332898
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_BSLUsers                       0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_mailTimeline                   0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_SBMessages          54390,15054201
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_NewsFeedTimeline        1866,7681861
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_BSLSpots                     1,616
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBUniqueIds                    0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBChecks                       0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_NFTimelines             423,533156
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBAppUsersProfilePic                 0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_SBLikes                     16,565
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_SBCodes                  154,93735
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_attachmentsPerUser           80,107551
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_comments                  58,77581
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_attachments             641,247796
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_SBStats                        0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_userReversedPhoneAgenda        9047,7132129
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_ACD.CF_GSMembers               823,1206779
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,113 StatusLogger.java (line 116) SB_ACD.CF_NSLoad                          0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,114 StatusLogger.java (line 116) SB_ACD.CF_sessions                        0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,115 StatusLogger.java (line 116) SB_ACD.CF_userGS                   193,577229
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line 116) SB_ACD.CF_userApps                      8,170
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line 116) SB_ACD.CF_GSData                     126,3503
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,117 StatusLogger.java (line 116) SB_ACD.CF_R2Notifications          1906,58481
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,118 StatusLogger.java (line 116) SB_ACD.CF_uniqueKeys                      0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,155 StatusLogger.java (line 116) SB_ACD.CF_userContactWannabes          953,438671
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,157 StatusLogger.java (line 116) SB_ACD.CF_SBApps                          0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,158 StatusLogger.java (line 116) SB_ACD.CF_SBAppUsers                      0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line 116) SB_ACD.CF_R1Notifications        1354,2070513
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line 116) SB_ACD.CF_SBVotes                  463,230386
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,161 StatusLogger.java (line 116) SB_ACD.CF_userContacts           2968,4849519
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line 116) SB_ACD.CF_sessionsSuspended                 0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line 116) SB_ACD.CF_userBlockedList               5,207
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,172 StatusLogger.java (line 116) SB_ACD.CF_users                65386,10852776
WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 GCInspector.java (line 145) Heap is 0.7514124302624755 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 StorageService.java (line 2880) Flushing CFS(Keyspace='SB_NNCD', ColumnFamily='CF_mailPipes') to relieve memory pressure
 INFO [ScheduledTasks:1] 2013-01-30 21:37:05,896 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live bytes, 9380 ops)
 INFO [FlushWriter:13] 2013-01-30 21:37:05,909 Memtable.java (line 264) Writing Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live bytes, 9380 ops)
 INFO [FlushWriter:13] 2013-01-30 21:37:06,385 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/SB_NNCD/CF_mailPipes/SB_NNCD-CF_mailPipes-hf-2165-Data.db (2049401 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=13578602)
 WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202) setting live ratio to maximum of 64.0 instead of 751.6512549537648
 INFO [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 213) CFS(Keyspace='SB_ACD', ColumnFamily='CF_R2Notifications') liveRatio is 64.0 (just-counted was 64.0).  calculation took 61ms for 8 columns
 INFO [MemoryMeter:1] 2013-01-30 21:40:47,274 Memtable.java (line 213) CFS(Keyspace='SB_ACD', ColumnFamily='CF_userGS') liveRatio is 5.309650177091864 (just-counted was 5.309650177091864).  calculation took 11ms for 170 columns
 INFO [HintedHandoff:1] 2013-01-30 21:42:03,709 HintedHandOffManager.java (line 296) Started hinted handoff for token: 85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:42:03,712 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live bytes, 2 ops)
 INFO [FlushWriter:14] 2013-01-30 21:42:03,713 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live bytes, 2 ops)
 INFO [FlushWriter:14] 2013-01-30 21:42:03,730 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db (1016 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=17323736)
 INFO [CompactionExecutor:48] 2013-01-30 21:42:03,731 CompactionTask.java (line 109) Compacting [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db'), SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-121-Data.db')]

The cluster is perfectly balanced:

Note: Ownership information does not include topology, please specify a keyspace. 
Address         DC          Rack        Status State   Load            Owns                Token                                       
                                                                                           141784319550391032739561396922763706368     
10.0.0.150      datacenter1 rack1       Up     Normal  13.25 GB        16.67%              0                                           
10.0.0.151      datacenter1 rack1       Up     Normal  10.31 GB        16.67%              28356863910078203714492389662765613056      
10.0.0.152      datacenter1 rack1       Up     Normal  8.58 GB         16.67%              56713727820156407428984779325531226112      
10.0.0.153      datacenter1 rack1       Up     Normal  8.1 GB          16.67%              85070591730234615865843651857942052864      
10.0.0.154      datacenter1 rack1       Up     Normal  8.28 GB         16.67%              113427455640312814857969558651062452224     
10.0.0.155      datacenter1 rack1       Up     Normal  9.38 GB         16.67%              141784319550391032739561396922763706368   


I guess something happens on 152 that makes other nodes believe it is dead, then does something to fix itself and comes back after (messing up performance in the meanwhile) .Does this makes any sense? Any clue of what might be going on? I can post more logs if you guys need

Thanks again!!

Guillermo



2013/1/30 Nate McCall <zznate.m@gmail.com>
What's the output of nodetool cfstats for those 2 column families on
cassNode2 and cassNode3? And what is the replication factor for this
cluster?

Per the previous reply, nodetool ring should show each of your nodes
with ~16.7% of the data if well balanced.

Also, the auto-detection for memory sizes in the startup script is a
little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of
ram. It usually ends up allocating 4g/400m (max and young) whereas
8g/800m will give you some more breathing room.

On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot <btalbot@aeriagames.com> wrote:
> My guess is that those one or two nodes with the gc pressure also have more
> rows in your big CF.  More rows could be due to imbalanced distribution if
> your'e not using a random partitioner or from those nodes not yet removing
> deleted rows which other nodes may have done.
>
> JVM heap space is used for a few things which scale with key count
> including:
> - bloom filter (for C* < 1.2)
> - index samples
>
> Other space is used but can be more easily controlled by tuning for
> - memtable
> - compaction
> - key cache
> - row cache
>
>
> So, if those nodes have more rows (check using "nodetool ring" or "nodetool
> cfstats") than the others you can try to:
> - reduce the number of rows by adding nodes, run manual / tune compactions
> to remove rows with expired tombstones, etc.
> - increase bloom filter fp chance
> - increase jvm heap size (don't go too big)
> - disable key or row cache
> - increase index sample interval
>
> Not all of those things are generally good especially to the extreme so
> don't go setting a 20 GB jvm heap without understanding the consequences for
> example.
>
> -Bryan
>
>
> On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero
> <guillermo.barbero@spotbros.com> wrote:
>>
>> Hi,
>>
>>   I'm viewing a weird behaviour in my cassandra cluster. Most of the
>> warning messages are due to Heap is % full. According to this link
>>
>> (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html)
>> there are two ways to "reduce pressure":
>> 1. Decrease the cache sizes
>> 2. Increase the index interval size
>>
>> Most of the flushes are in two column families (users and messages), I
>> guess that's because the most mutations are there.
>>
>> I still have not applied those changes to the production environment.
>> Do you recommend any other meassure? Should I set specific tunning for
>> these two CFs? Should I check another metric?
>>
>> Additionally, the distribution of warning messages is not uniform
>> along the cluster. Why could cassandra be doing this? What should I do
>> to find out how to fix this?
>>
>> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
>> the java version is the following:
>> java version "1.6.0_37"
>> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
>> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>>
>> The cassandra system.log is resumed here (numer of messages, cassandra
>> node, class that reports the message, first word of the message)
>> 2013-01-26
>>       5 cassNode0: GCInspector.java Heap
>>       5 cassNode0: StorageService.java Flushing
>>     232 cassNode2: GCInspector.java Heap
>>     232 cassNode2: StorageService.java Flushing
>>     104 cassNode3: GCInspector.java Heap
>>     104 cassNode3: StorageService.java Flushing
>>       3 cassNode4: GCInspector.java Heap
>>       3 cassNode4: StorageService.java Flushing
>>       3 cassNode5: GCInspector.java Heap
>>       3 cassNode5: StorageService.java Flushing
>>
>> 2013-01-27
>>       2 cassNode0: GCInspector.java Heap
>>       2 cassNode0: StorageService.java Flushing
>>       3 cassNode1: GCInspector.java Heap
>>       3 cassNode1: StorageService.java Flushing
>>     189 cassNode2: GCInspector.java Heap
>>     189 cassNode2: StorageService.java Flushing
>>     104 cassNode3: GCInspector.java Heap
>>     104 cassNode3: StorageService.java Flushing
>>       1 cassNode4: GCInspector.java Heap
>>       1 cassNode4: StorageService.java Flushing
>>       1 cassNode5: GCInspector.java Heap
>>       1 cassNode5: StorageService.java Flushing
>>
>> 2013-01-28
>>       2 cassNode0: GCInspector.java Heap
>>       2 cassNode0: StorageService.java Flushing
>>       1 cassNode1: GCInspector.java Heap
>>       1 cassNode1: StorageService.java Flushing
>>       1 cassNode2: AutoSavingCache.java Reducing
>>     343 cassNode2: GCInspector.java Heap
>>     342 cassNode2: StorageService.java Flushing
>>     181 cassNode3: GCInspector.java Heap
>>     181 cassNode3: StorageService.java Flushing
>>       4 cassNode4: GCInspector.java Heap
>>       4 cassNode4: StorageService.java Flushing
>>       3 cassNode5: GCInspector.java Heap
>>       3 cassNode5: StorageService.java Flushing
>>
>> 2013-01-29
>>       2 cassNode0: GCInspector.java Heap
>>       2 cassNode0: StorageService.java Flushing
>>       3 cassNode1: GCInspector.java Heap
>>       3 cassNode1: StorageService.java Flushing
>>     156 cassNode2: GCInspector.java Heap
>>     156 cassNode2: StorageService.java Flushing
>>      71 cassNode3: GCInspector.java Heap
>>      71 cassNode3: StorageService.java Flushing
>>       2 cassNode4: GCInspector.java Heap
>>       2 cassNode4: StorageService.java Flushing
>>       2 cassNode5: GCInspector.java Heap
>>       1 cassNode5: Memtable.java setting
>>       2 cassNode5: StorageService.java Flushing
>>
>> --
>>
>> Guillermo Barbero - Backend Team
>>
>> Spotbros Technologies
>
>
>



--
 
Guillermo Barbero - Backend Team

Spotbros Technologies

  




--
Derek Williams