Hello,

We moved from 0.6.6 to 0.6.13 recently on an 8 nodes cluster and started to see issues with two nodes where memtables are being flushed at a high rate and compaction seems to have fallen off or behind.  A huge number of sstables has accumilated as a result of slowed compaction.  We are also seeing a high number of dropped reads on only these two nodes.

Here are the log entries for the two nodes:

Node 11
2011-11-04_12:20:20.71219 '' WARN [DroppedMessagesLogger] 12:20:20,924 MessagingService.java:479 Dropped 126 READ messages in the last 5000ms
2011-11-04_12:20:20.92854 '' INFO [DroppedMessagesLogger] 12:20:20,924 GCInspector.java:143 Pool Name                    Active   Pending
2011-11-04_12:20:20.92874 '' INFO [DroppedMessagesLogger] 12:20:20,924 GCInspector.java:157 STREAM-STAGE                      0         0
2011-11-04_12:20:20.92895 '' INFO [DroppedMessagesLogger] 12:20:20,924 GCInspector.java:157 FILEUTILS-DELETE-POOL             0         0
2011-11-04_12:20:20.92915 '' INFO [FLUSH-WRITER-POOL:1] 12:20:20,924 Memtable.java:166 Completed flushing /var/lib/cassandra/data/current/SoundCloud/Activities-487528-Data.db (3619622 bytes)
2011-11-04_12:20:20.93263 '' INFO [DroppedMessagesLogger] 12:20:20,924 GCInspector.java:157 RESPONSE-STAGE                    0         0
2011-11-04_12:20:20.93263 '' INFO [DroppedMessagesLogger] 12:20:20,925 GCInspector.java:157 ROW-READ-STAGE                    8       348
2011-11-04_12:20:20.93264 '' INFO [DroppedMessagesLogger] 12:20:20,925 GCInspector.java:157 LB-OPERATIONS                     0         0
2011-11-04_12:20:20.93264 '' INFO [DroppedMessagesLogger] 12:20:20,925 GCInspector.java:157 MISCELLANEOUS-POOL                0         0
2011-11-04_12:20:20.93265 '' INFO [DroppedMessagesLogger] 12:20:20,925 GCInspector.java:157 GMFD                              0         0
2011-11-04_12:20:20.93265 '' INFO [DroppedMessagesLogger] 12:20:20,925 GCInspector.java:157 CONSISTENCY-MANAGER               0         0
2011-11-04_12:20:20.93265 '' INFO [DroppedMessagesLogger] 12:20:20,926 GCInspector.java:157 LB-TARGET                         0         0
2011-11-04_12:20:20.93266 '' INFO [DroppedMessagesLogger] 12:20:20,926 GCInspector.java:157 ROW-MUTATION-STAGE                0         0
2011-11-04_12:20:20.93267 '' INFO [DroppedMessagesLogger] 12:20:20,926 GCInspector.java:157 MESSAGE-STREAMING-POOL            0         0
2011-11-04_12:20:20.93267 '' INFO [DroppedMessagesLogger] 12:20:20,926 GCInspector.java:157 LOAD-BALANCER-STAGE               0         0
2011-11-04_12:20:20.93268 '' INFO [DroppedMessagesLogger] 12:20:20,926 GCInspector.java:157 FLUSH-SORTER-POOL                 0         0
2011-11-04_12:20:20.93268 '' INFO [DroppedMessagesLogger] 12:20:20,926 GCInspector.java:157 MEMTABLE-POST-FLUSHER             1         2
2011-11-04_12:20:20.93269 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:157 AE-SERVICE-STAGE                  0         0
2011-11-04_12:20:20.93269 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:157 FLUSH-WRITER-POOL                 1         2
2011-11-04_12:20:20.93269 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:157 HINTED-HANDOFF-POOL               1         6
2011-11-04_12:20:20.93270 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:161 CompactionManager               n/a      4089
2011-11-04_12:20:20.93270 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:165 ColumnFamily                Memtable ops,data  Row cache size/cap  Key cache size/cap
2011-11-04_12:20:20.93271 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:168 system.LocationInfo                       0,0                 0/0                 1/3
2011-11-04_12:20:20.93272 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:168 system.HintsColumnFamily                 4,46                 0/0                 2/6
2011-11-04_12:20:20.93272 '' INFO [DroppedMessagesLogger] 12:20:20,927 GCInspector.java:168 SoundCloud.OwnActivities         28790,539601                 0/0        37303/200000
2011-11-04_12:20:20.93273 '' INFO [DroppedMessagesLogger] 12:20:20,928 GCInspector.java:168 SoundCloud.ExclusiveTracks        10230,207529                 0/0         3646/200000
2011-11-04_12:20:20.93273 '' INFO [DroppedMessagesLogger] 12:20:20,928 GCInspector.java:168 SoundCloud.Activities                    5,90                 0/0       200000/200000
2011-11-04_12:20:20.93274 '' INFO [DroppedMessagesLogger] 12:20:20,928 GCInspector.java:168 SoundCloud.IncomingTracks                 0,0                 0/0       200000/200000

Node 17
2011-11-04_12:21:55.15215 '' WARN [DroppedMessagesLogger] 12:21:55,417 MessagingService.java:479 Dropped 81 READ messages in the last 5000ms
2011-11-04_12:21:55.41788 '' INFO [DroppedMessagesLogger] 12:21:55,417 GCInspector.java:143 Pool Name                    Active   Pending
2011-11-04_12:21:55.41789 '' INFO [DroppedMessagesLogger] 12:21:55,418 GCInspector.java:157 STREAM-STAGE                      0         0
2011-11-04_12:21:55.41851 '' INFO [DroppedMessagesLogger] 12:21:55,418 GCInspector.java:157 FILEUTILS-DELETE-POOL             0         0
2011-11-04_12:21:55.41877 '' INFO [DroppedMessagesLogger] 12:21:55,418 GCInspector.java:157 RESPONSE-STAGE                    0         0
2011-11-04_12:21:55.42379 '' INFO [DroppedMessagesLogger] 12:21:55,419 GCInspector.java:157 ROW-READ-STAGE                    8       211
2011-11-04_12:21:55.42403 '' INFO [DroppedMessagesLogger] 12:21:55,419 GCInspector.java:157 LB-OPERATIONS                     0         0
2011-11-04_12:21:55.42427 '' INFO [DroppedMessagesLogger] 12:21:55,419 GCInspector.java:157 MISCELLANEOUS-POOL                0         0
2011-11-04_12:21:55.42448 '' INFO [DroppedMessagesLogger] 12:21:55,419 GCInspector.java:157 GMFD                              0         0
2011-11-04_12:21:55.42473 '' INFO [DroppedMessagesLogger] 12:21:55,419 GCInspector.java:157 CONSISTENCY-MANAGER               0         0
2011-11-04_12:21:55.42495 '' INFO [DroppedMessagesLogger] 12:21:55,420 GCInspector.java:157 LB-TARGET                         0         0
2011-11-04_12:21:55.42515 '' INFO [DroppedMessagesLogger] 12:21:55,420 GCInspector.java:157 ROW-MUTATION-STAGE                1         1
2011-11-04_12:21:55.42537 '' INFO [DroppedMessagesLogger] 12:21:55,420 GCInspector.java:157 MESSAGE-STREAMING-POOL            0         0
2011-11-04_12:21:55.42561 '' INFO [DroppedMessagesLogger] 12:21:55,420 GCInspector.java:157 LOAD-BALANCER-STAGE               0         0
2011-11-04_12:21:55.42580 '' INFO [DroppedMessagesLogger] 12:21:55,421 GCInspector.java:157 FLUSH-SORTER-POOL                 0         0
2011-11-04_12:21:55.42602 '' INFO [DroppedMessagesLogger] 12:21:55,421 GCInspector.java:157 MEMTABLE-POST-FLUSHER             1         3
2011-11-04_12:21:55.42626 '' INFO [DroppedMessagesLogger] 12:21:55,421 GCInspector.java:157 AE-SERVICE-STAGE                  0         0
2011-11-04_12:21:55.42649 '' INFO [DroppedMessagesLogger] 12:21:55,421 GCInspector.java:157 FLUSH-WRITER-POOL                 1         1
2011-11-04_12:21:55.42670 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:157 HINTED-HANDOFF-POOL               1         8
2011-11-04_12:21:55.42695 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:161 CompactionManager               n/a      3423
2011-11-04_12:21:55.42717 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:165 ColumnFamily                Memtable ops,data  Row cache size/cap  Key cache size/cap
2011-11-04_12:21:55.42832 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:168 system.LocationInfo                       0,0                 0/0                 1/2
2011-11-04_12:21:55.42833 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:168 system.HintsColumnFamily                  0,0                 0/0                 1/6
2011-11-04_12:21:55.42833 '' INFO [DroppedMessagesLogger] 12:21:55,422 GCInspector.java:168 SoundCloud.OwnActivities           2545,47090                 0/0        41956/200000
2011-11-04_12:21:55.42833 '' INFO [DroppedMessagesLogger] 12:21:55,423 GCInspector.java:168 SoundCloud.ExclusiveTracks           570,11872                 0/0         2645/200000
2011-11-04_12:21:55.42834 '' INFO [DroppedMessagesLogger] 12:21:55,423 GCInspector.java:168 SoundCloud.Activities          126085,2171439                 0/0       200000/200000
2011-11-04_12:21:55.42872 '' INFO [DroppedMessagesLogger] 12:21:55,423 GCInspector.java:168 SoundCloud.IncomingTracks       95470,1604563                 0/0       200000/200000

We have tried to run manual compactions but these don't seem to happen every, like do to the high pending count.

I am wondering what the best way to figure out what is blocking on these nodes, in order to get compaction back in that game.

I have considered isolating one node via the network to see if it can catch up once there is no load on it.  Not sure of the negative side effects of that.

Any suggestions on resolving this?

Regards,

Jake

--
Jake Maizel
Head of Network Operations
Soundcloud

Mail & GTalk: jake@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE