incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Barbero <guillermo.barb...@spotbros.com>
Subject Re: too many warnings of Heap is full
Date Wed, 30 Jan 2013 22:44:24 GMT
Guys,

First thanks a lot for your answers.. now I have a little more info, lets
see if this helps:

As I said before we are using Cassandra 1.1.7 on a 6 nodes(Ips: 150-155)
(Amazon EC2 XL instances 17GB RAM) and the problem is that in the last
weeks we are seing that the performace of the cluster falls.. PHP commands
that usually lasted a few milliseconds last up to 15 seconds for just 2/3
minutes every now and then (no specific pattern) (see commands timing in
attached picture).

Today we have found this:

Prior to a fall in performance on all Cassandra nodes log I can see this:

*Node 150:*

INFO [GossipTasks:1] 2013-01-30 21:35:23,514 Gossiper.java (line 831)
InetAddress /10.0.0.152 is now dead.
 INFO [GossipStage:1] 2013-01-30 21:35:40,666 Gossiper.java (line 817)
InetAddress /10.0.0.152 is now UP
 INFO [HintedHandoff:1] 2013-01-30 21:35:40,667 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:35:41,266 ColumnFamilyStore.java
(line 659) Enqueuing flush of
Memtable-HintsColumnFamily@1264710747(84317/476272
serialized/live bytes, 298 ops)
 INFO [FlushWriter:2879] 2013-01-30 21:35:41,267 Memtable.java (line 264)
Writing Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live
bytes, 298 ops)
 INFO [FlushWriter:2879] 2013-01-30 21:35:41,282 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db
(84559 bytes) for commitlog position
ReplayPosition(segmentId=1355151995930, position=32659991)
 INFO [CompactionExecutor:9098] 2013-01-30 21:35:41,283 CompactionTask.java
(line 109) Compacting
[SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-275-Data.db'),
SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db')]

*Node 151:*
*
*
 INFO [GossipTasks:1] 2013-01-30 21:35:25,689 Gossiper.java (line 831)
InetAddress /10.0.0.152 is now dead.
 INFO [GossipStage:1] 2013-01-30 21:35:40,677 Gossiper.java (line 817)
InetAddress /10.0.0.152 is now UP
 INFO [HintedHandoff:1] 2013-01-30 21:35:40,677 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:35:41,548 ColumnFamilyStore.java
(line 659) Enqueuing flush of
Memtable-HintsColumnFamily@1453632632(93461/191418
serialized/live bytes, 316 ops)
 INFO [FlushWriter:2636] 2013-01-30 21:35:41,549 Memtable.java (line 264)
Writing Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live
bytes, 316 ops)
 INFO [FlushWriter:2636] 2013-01-30 21:35:41,563 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-266-Data.db
(93711 bytes) for commitlog position
ReplayPosition(segmentId=1354832000013, position=20033362)

*Node 153:*

* *INFO [GossipTasks:1] 2013-01-30 21:35:24,822 Gossiper.java (line 831)
InetAddress /10.0.0.152 is now dead.
 INFO [ScheduledTasks:1] 2013-01-30 21:35:24,896 GCInspector.java (line
122) GC for ParNew: 294 ms for 1 collections, 1539602448 used; max is
3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:35:26,897 GCInspector.java (line
122) GC for ParNew: 202 ms for 1 collections, 1829975768 used; max is
3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:35:27,898 GCInspector.java (line
122) GC for ParNew: 234 ms for 1 collections, 2037767104 used; max is
3886022656
 INFO [MemoryMeter:1] 2013-01-30 21:35:31,151 Memtable.java (line 213)
CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
1.4195889775077013 (just-counted was 1.4195889775077013).  calculation took
109ms for 64 columns
 INFO [MemoryMeter:1] 2013-01-30 21:35:37,806 Memtable.java (line 213)
CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
1.5276925191259334 (just-counted was 1.5276925191259334).  calculation took
11ms for 129 columns
 INFO [GossipStage:1] 2013-01-30 21:35:40,735 Gossiper.java (line 817)
InetAddress /10.0.0.152 is now UP
 INFO [HintedHandoff:1] 2013-01-30 21:35:40,742 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [MemoryMeter:1] 2013-01-30 21:35:43,184 Memtable.java (line 213)
CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
1.561312033901826 (just-counted was 1.561312033901826).  calculation took
234ms for 144 columns
 INFO [HintedHandoff:1] 2013-01-30 21:35:43,549 ColumnFamilyStore.java
(line 659) Enqueuing flush of
Memtable-HintsColumnFamily@1120216043(96278/187900
serialized/live bytes, 288 ops)

*Node 154:*
*
*
INFO [HintedHandoff:1] 2013-01-30 21:35:00,729 HintedHandOffManager.java
(line 392) Finished hinted handoff of 0 rows to endpoint /10.0.0.153
 INFO [GossipTasks:1] 2013-01-30 21:35:26,432 Gossiper.java (line 831)
InetAddress /10.0.0.152 is now dead.
 INFO [GossipStage:1] 2013-01-30 21:35:40,669 Gossiper.java (line 817)
InetAddress /10.0.0.152 is now UP
 INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:35:41,208 ColumnFamilyStore.java
(line 659) Enqueuing flush of Memtable-HintsColumnFamily@908822645(92404/185145
serialized/live bytes, 286 ops)
 INFO [FlushWriter:2496] 2013-01-30 21:35:41,209 Memtable.java (line 264)
Writing Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live
bytes, 286 ops)


Node 155:

INFO [GossipTasks:1] 2013-01-30 21:35:27,117 Gossiper.java (line 831)
InetAddress /10.0.0.152 is now dead.
 INFO [GossipStage:1] 2013-01-30 21:35:40,668 Gossiper.java (line 817)
InetAddress /10.0.0.152 is now UP
 INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:35:41,279 ColumnFamilyStore.java
(line 659) Enqueuing flush of
Memtable-HintsColumnFamily@1083199446(94578/190654
serialized/live bytes, 316 ops)

*Meanwhile on node 152:*

INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-4175-Data.db
(639626 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=11032499)
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,701 GCInspector.java (line
122) GC for ParNew: 35087 ms for 3 collections, 442940272 used; max is
3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,934 StatusLogger.java (line
57) Pool Name                    Active   Pending   Blocked
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,941 StatusLogger.java (line
72) ReadStage                        32       343         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
72) RequestResponseStage              0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
72) ReadRepairStage                   0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
72) MutationStage                     0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
72) ReplicateOnWriteStage             0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
72) GossipStage                       0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
72) AntiEntropyStage                  0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line
72) MigrationStage                    0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line
72) StreamStage                       0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
72) MemtablePostFlusher               0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
72) FlushWriter                       0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
72) MiscStage                         0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
72) commitlog_archiver                0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
72) InternalResponseStage             0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
72) HintedHandoff                     0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
77) CompactionManager                 0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
89) MessagingService                n/a      0,24
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
99) Cache Type                     Size                 Capacity
    KeysToSave
Provider
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
100) KeyCache                     133569                  2184533
           all
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
106) RowCache                          0                        0
           all
 org.apache.cassandra.cache.SerializingCacheProvider
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
113) ColumnFamily                Memtable ops,data
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
116) system.NodeIdInfo                         0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
116) system.IndexInfo                          0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
116) system.LocationInfo                       0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
116) system.Versions                         3,103
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
116) system.schema_keyspaces                   0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,950 StatusLogger.java (line
116) system.Migrations                         0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,059 StatusLogger.java (line
116) system.schema_columnfamilies                 0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
116) system.schema_columns                     0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
116) system.HintsColumnFamily               1,2369
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
116) system.Schema                             0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
116) SB_NNCD.CF_mailPipes            9241,30380203
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
116) SB_NNCD.CF_profilePics            427,1173642
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
116) SB_NNCD.CF_SBVars                         0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
116) SB_NNCD.CF_SBCounters                   5,232
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,092 StatusLogger.java (line
116) SB_NNCD.CF_userPhoneAgenda        7919,2332898
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
116) SB_NNCD.CF_BSLUsers                       0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
116) SB_NNCD.CF_mailTimeline                   0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
116) SB_NNCD.CF_SBMessages          54390,15054201
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
116) SB_NNCD.CF_NewsFeedTimeline        1866,7681861
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
116) SB_NNCD.CF_BSLSpots                     1,616
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
116) SB_NNCD.CF_SBUniqueIds                    0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
116) SB_NNCD.CF_SBChecks                       0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
116) SB_NNCD.CF_NFTimelines             423,533156
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
116) SB_NNCD.CF_SBAppUsersProfilePic                 0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
116) SB_NNCD.CF_SBLikes                     16,565
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
116) SB_NNCD.CF_SBCodes                  154,93735
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
116) SB_NNCD.CF_attachmentsPerUser           80,107551
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
116) SB_NNCD.CF_comments                  58,77581
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
116) SB_NNCD.CF_attachments             641,247796
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
116) SB_NNCD.CF_SBStats                        0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
116) SB_NNCD.CF_userReversedPhoneAgenda        9047,7132129
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
116) SB_ACD.CF_GSMembers               823,1206779
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,113 StatusLogger.java (line
116) SB_ACD.CF_NSLoad                          0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,114 StatusLogger.java (line
116) SB_ACD.CF_sessions                        0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,115 StatusLogger.java (line
116) SB_ACD.CF_userGS                   193,577229
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line
116) SB_ACD.CF_userApps                      8,170
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line
116) SB_ACD.CF_GSData                     126,3503
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,117 StatusLogger.java (line
116) SB_ACD.CF_R2Notifications          1906,58481
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,118 StatusLogger.java (line
116) SB_ACD.CF_uniqueKeys                      0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,155 StatusLogger.java (line
116) SB_ACD.CF_userContactWannabes          953,438671
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,157 StatusLogger.java (line
116) SB_ACD.CF_SBApps                          0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,158 StatusLogger.java (line
116) SB_ACD.CF_SBAppUsers                      0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line
116) SB_ACD.CF_R1Notifications        1354,2070513
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line
116) SB_ACD.CF_SBVotes                  463,230386
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,161 StatusLogger.java (line
116) SB_ACD.CF_userContacts           2968,4849519
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line
116) SB_ACD.CF_sessionsSuspended                 0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line
116) SB_ACD.CF_userBlockedList               5,207
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,172 StatusLogger.java (line
116) SB_ACD.CF_users                65386,10852776
WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 GCInspector.java (line 145)
Heap is 0.7514124302624755 full.  You may need to reduce memtable and/or
cache sizes.  Cassandra will now flush up to the two largest memtables to
free up memory.  Adjust flush_largest_memtables_at threshold in
cassandra.yaml if you don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 StorageService.java (line
2880) Flushing CFS(Keyspace='SB_NNCD', ColumnFamily='CF_mailPipes') to
relieve memory pressure
 INFO [ScheduledTasks:1] 2013-01-30 21:37:05,896 ColumnFamilyStore.java
(line 659) Enqueuing flush of Memtable-CF_mailPipes@192792072(6868296/30790355
serialized/live bytes, 9380 ops)
 INFO [FlushWriter:13] 2013-01-30 21:37:05,909 Memtable.java (line 264)
Writing Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live
bytes, 9380 ops)
 INFO [FlushWriter:13] 2013-01-30 21:37:06,385 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/SB_NNCD/CF_mailPipes/SB_NNCD-CF_mailPipes-hf-2165-Data.db
(2049401 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=13578602)
 WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202)
setting live ratio to maximum of 64.0 instead of 751.6512549537648
 INFO [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_R2Notifications') liveRatio is 64.0
(just-counted was 64.0).  calculation took 61ms for 8 columns
 INFO [MemoryMeter:1] 2013-01-30 21:40:47,274 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_userGS') liveRatio is
5.309650177091864 (just-counted was 5.309650177091864).  calculation took
11ms for 170 columns
 INFO [HintedHandoff:1] 2013-01-30 21:42:03,709 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:42:03,712 ColumnFamilyStore.java
(line 659) Enqueuing flush of Memtable-HintsColumnFamily@1676047624(950/2369
serialized/live bytes, 2 ops)
 INFO [FlushWriter:14] 2013-01-30 21:42:03,713 Memtable.java (line 264)
Writing Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live
bytes, 2 ops)
 INFO [FlushWriter:14] 2013-01-30 21:42:03,730 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db
(1016 bytes) for commitlog position ReplayPosition(segmentId=1359573115280,
position=17323736)
 INFO [CompactionExecutor:48] 2013-01-30 21:42:03,731 CompactionTask.java
(line 109) Compacting
[SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db'),
SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-121-Data.db')]

The cluster is perfectly balanced:

Note: Ownership information does not include topology, please specify a
keyspace.
Address         DC          Rack        Status State   Load            Owns
               Token

               141784319550391032739561396922763706368
10.0.0.150      datacenter1 rack1       Up     Normal  13.25 GB
 16.67%              0
10.0.0.151      datacenter1 rack1       Up     Normal  10.31 GB
 16.67%              28356863910078203714492389662765613056
10.0.0.152      datacenter1 rack1       Up     Normal  8.58 GB
16.67%              56713727820156407428984779325531226112
10.0.0.153      datacenter1 rack1       Up     Normal  8.1 GB
 16.67%              85070591730234615865843651857942052864
10.0.0.154      datacenter1 rack1       Up     Normal  8.28 GB
16.67%              113427455640312814857969558651062452224
10.0.0.155      datacenter1 rack1       Up     Normal  9.38 GB
16.67%              141784319550391032739561396922763706368


I guess something happens on 152 that makes other nodes believe it is dead,
then does something to fix itself and comes back after (messing up
performance in the meanwhile) .Does this makes any sense? Any clue of what
might be going on? I can post more logs if you guys need

Thanks again!!

Guillermo



2013/1/30 Nate McCall <zznate.m@gmail.com>

> What's the output of nodetool cfstats for those 2 column families on
> cassNode2 and cassNode3? And what is the replication factor for this
> cluster?
>
> Per the previous reply, nodetool ring should show each of your nodes
> with ~16.7% of the data if well balanced.
>
> Also, the auto-detection for memory sizes in the startup script is a
> little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of
> ram. It usually ends up allocating 4g/400m (max and young) whereas
> 8g/800m will give you some more breathing room.
>
> On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot <btalbot@aeriagames.com>
> wrote:
> > My guess is that those one or two nodes with the gc pressure also have
> more
> > rows in your big CF.  More rows could be due to imbalanced distribution
> if
> > your'e not using a random partitioner or from those nodes not yet
> removing
> > deleted rows which other nodes may have done.
> >
> > JVM heap space is used for a few things which scale with key count
> > including:
> > - bloom filter (for C* < 1.2)
> > - index samples
> >
> > Other space is used but can be more easily controlled by tuning for
> > - memtable
> > - compaction
> > - key cache
> > - row cache
> >
> >
> > So, if those nodes have more rows (check using "nodetool ring" or
> "nodetool
> > cfstats") than the others you can try to:
> > - reduce the number of rows by adding nodes, run manual / tune
> compactions
> > to remove rows with expired tombstones, etc.
> > - increase bloom filter fp chance
> > - increase jvm heap size (don't go too big)
> > - disable key or row cache
> > - increase index sample interval
> >
> > Not all of those things are generally good especially to the extreme so
> > don't go setting a 20 GB jvm heap without understanding the consequences
> for
> > example.
> >
> > -Bryan
> >
> >
> > On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero
> > <guillermo.barbero@spotbros.com> wrote:
> >>
> >> Hi,
> >>
> >>   I'm viewing a weird behaviour in my cassandra cluster. Most of the
> >> warning messages are due to Heap is % full. According to this link
> >>
> >> (
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html
> )
> >> there are two ways to "reduce pressure":
> >> 1. Decrease the cache sizes
> >> 2. Increase the index interval size
> >>
> >> Most of the flushes are in two column families (users and messages), I
> >> guess that's because the most mutations are there.
> >>
> >> I still have not applied those changes to the production environment.
> >> Do you recommend any other meassure? Should I set specific tunning for
> >> these two CFs? Should I check another metric?
> >>
> >> Additionally, the distribution of warning messages is not uniform
> >> along the cluster. Why could cassandra be doing this? What should I do
> >> to find out how to fix this?
> >>
> >> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
> >> the java version is the following:
> >> java version "1.6.0_37"
> >> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
> >> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
> >>
> >> The cassandra system.log is resumed here (numer of messages, cassandra
> >> node, class that reports the message, first word of the message)
> >> 2013-01-26
> >>       5 cassNode0: GCInspector.java Heap
> >>       5 cassNode0: StorageService.java Flushing
> >>     232 cassNode2: GCInspector.java Heap
> >>     232 cassNode2: StorageService.java Flushing
> >>     104 cassNode3: GCInspector.java Heap
> >>     104 cassNode3: StorageService.java Flushing
> >>       3 cassNode4: GCInspector.java Heap
> >>       3 cassNode4: StorageService.java Flushing
> >>       3 cassNode5: GCInspector.java Heap
> >>       3 cassNode5: StorageService.java Flushing
> >>
> >> 2013-01-27
> >>       2 cassNode0: GCInspector.java Heap
> >>       2 cassNode0: StorageService.java Flushing
> >>       3 cassNode1: GCInspector.java Heap
> >>       3 cassNode1: StorageService.java Flushing
> >>     189 cassNode2: GCInspector.java Heap
> >>     189 cassNode2: StorageService.java Flushing
> >>     104 cassNode3: GCInspector.java Heap
> >>     104 cassNode3: StorageService.java Flushing
> >>       1 cassNode4: GCInspector.java Heap
> >>       1 cassNode4: StorageService.java Flushing
> >>       1 cassNode5: GCInspector.java Heap
> >>       1 cassNode5: StorageService.java Flushing
> >>
> >> 2013-01-28
> >>       2 cassNode0: GCInspector.java Heap
> >>       2 cassNode0: StorageService.java Flushing
> >>       1 cassNode1: GCInspector.java Heap
> >>       1 cassNode1: StorageService.java Flushing
> >>       1 cassNode2: AutoSavingCache.java Reducing
> >>     343 cassNode2: GCInspector.java Heap
> >>     342 cassNode2: StorageService.java Flushing
> >>     181 cassNode3: GCInspector.java Heap
> >>     181 cassNode3: StorageService.java Flushing
> >>       4 cassNode4: GCInspector.java Heap
> >>       4 cassNode4: StorageService.java Flushing
> >>       3 cassNode5: GCInspector.java Heap
> >>       3 cassNode5: StorageService.java Flushing
> >>
> >> 2013-01-29
> >>       2 cassNode0: GCInspector.java Heap
> >>       2 cassNode0: StorageService.java Flushing
> >>       3 cassNode1: GCInspector.java Heap
> >>       3 cassNode1: StorageService.java Flushing
> >>     156 cassNode2: GCInspector.java Heap
> >>     156 cassNode2: StorageService.java Flushing
> >>      71 cassNode3: GCInspector.java Heap
> >>      71 cassNode3: StorageService.java Flushing
> >>       2 cassNode4: GCInspector.java Heap
> >>       2 cassNode4: StorageService.java Flushing
> >>       2 cassNode5: GCInspector.java Heap
> >>       1 cassNode5: Memtable.java setting
> >>       2 cassNode5: StorageService.java Flushing
> >>
> >> --
> >>
> >> Guillermo Barbero - Backend Team
> >>
> >> Spotbros Technologies
> >
> >
> >
>



-- 
<http://spotbros.com>
Guillermo Barbero - Backend Team

Spotbros Technologies
<http://www.facebook.com/spotbros>  <http://twitter.com/spotbros>

Mime
View raw message