incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nate McCall <zznat...@gmail.com>
Subject Re: too many warnings of Heap is full
Date Wed, 30 Jan 2013 23:09:40 GMT
Your latencies and distribution look fine.

How big/what types of queries are you issuing? Are you issuing a lot
of large multigets?

Also, do either of these column families have secondary indexes?

On Wed, Jan 30, 2013 at 2:59 PM, Guillermo Barbero
<guillermo.barbero@spotbros.com> wrote:
> Iep,
>
>   I missed the attachment...
>
>   Also, these are the cfstats of 152:
>
> Column Family: CF_SBMessages
> SSTable count: 3
> Space used (live): 967238560
> Space used (total): 967238560
> Number of Keys (estimate): 3263232
> Memtable Columns Count: 2112
> Memtable Data Size: 577135
> Memtable Switch Count: 5
> Read Count: 20702
> Read Latency: 0.181 ms.
> Write Count: 19888
> Write Latency: 0.059 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 14
> Bloom Filter False Ratio: 0.09375
> Bloom Filter Space Used: 6171168
> Compacted row minimum size: 536
> Compacted row maximum size: 24601
> Compacted row mean size: 850
>
> Column Family: CF_users
> SSTable count: 3
> Space used (live): 152205376
> Space used (total): 152205376
> Number of Keys (estimate): 343040
> Memtable Columns Count: 154398
> Memtable Data Size: 21159410
> Memtable Switch Count: 9
> Read Count: 11816
> Read Latency: 1.348 ms.
> Write Count: 128751
> Write Latency: 0.090 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 0
> Bloom Filter False Ratio: 0.00000
> Bloom Filter Space Used: 688464
> Compacted row minimum size: 61
> Compacted row maximum size: 3311
> Compacted row mean size: 1235
>
> And the cfstats of 153:
>
> Column Family: CF_SBMessages
> SSTable count: 5
> Space used (live): 965495648
> Space used (total): 965495648
> Number of Keys (estimate): 3257216
> Memtable Columns Count: 56541
> Memtable Data Size: 15699960
> Memtable Switch Count: 2
> Read Count: 22475
> Read Latency: 0.142 ms.
> Write Count: 21719
> Write Latency: 0.073 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 43
> Bloom Filter False Ratio: 0.19545
> Bloom Filter Space Used: 6161032
> Compacted row minimum size: 536
> Compacted row maximum size: 11864
> Compacted row mean size: 872
>
> Column Family: CF_users
> SSTable count: 2
> Space used (live): 148762893
> Space used (total): 148762893
> Number of Keys (estimate): 333312
> Memtable Columns Count: 129725
> Memtable Data Size: 17144125
> Memtable Switch Count: 4
> Read Count: 7440
> Read Latency: 1.329 ms.
> Write Count: 127465
> Write Latency: 0.093 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 0
> Bloom Filter False Ratio: 0.00000
> Bloom Filter Space Used: 694112
> Compacted row minimum size: 61
> Compacted row maximum size: 3311
> Compacted row mean size: 1298
>
> The messages are in a keyspace with a replication factor of 2 and the
> users are in a keyspace with a replication factor of 3.
>
> Ah, and we use the RandomPartitioner.
>
> Thanks again
>
>
> 2013/1/30 Guillermo Barbero <guillermo.barbero@spotbros.com>
>>
>> Guys,
>>
>> First thanks a lot for your answers.. now I have a little more info, lets see if
this helps:
>>
>> As I said before we are using Cassandra 1.1.7 on a 6 nodes(Ips: 150-155) (Amazon
EC2 XL instances 17GB RAM) and the problem is that in the last weeks we are seing that the
performace of the cluster falls.. PHP commands that usually lasted a few milliseconds last
up to 15 seconds for just 2/3 minutes every now and then (no specific pattern) (see commands
timing in attached picture).
>>
>> Today we have found this:
>>
>> Prior to a fall in performance on all Cassandra nodes log I can see this:
>>
>> Node 150:
>>
>> INFO [GossipTasks:1] 2013-01-30 21:35:23,514 Gossiper.java (line 831) InetAddress
/10.0.0.152 is now dead.
>>  INFO [GossipStage:1] 2013-01-30 21:35:40,666 Gossiper.java (line 817) InetAddress
/10.0.0.152 is now UP
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,667 HintedHandOffManager.java (line 296)
Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,266 ColumnFamilyStore.java (line 659)
Enqueuing flush of Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live bytes,
298 ops)
>>  INFO [FlushWriter:2879] 2013-01-30 21:35:41,267 Memtable.java (line 264) Writing
Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live bytes, 298 ops)
>>  INFO [FlushWriter:2879] 2013-01-30 21:35:41,282 Memtable.java (line 305) Completed
flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db
(84559 bytes) for commitlog position ReplayPosition(segmentId=1355151995930, position=32659991)
>>  INFO [CompactionExecutor:9098] 2013-01-30 21:35:41,283 CompactionTask.java (line
109) Compacting [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-275-Data.db'),
SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db')]
>>
>> Node 151:
>>
>>  INFO [GossipTasks:1] 2013-01-30 21:35:25,689 Gossiper.java (line 831) InetAddress
/10.0.0.152 is now dead.
>>  INFO [GossipStage:1] 2013-01-30 21:35:40,677 Gossiper.java (line 817) InetAddress
/10.0.0.152 is now UP
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,677 HintedHandOffManager.java (line 296)
Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,548 ColumnFamilyStore.java (line 659)
Enqueuing flush of Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live bytes,
316 ops)
>>  INFO [FlushWriter:2636] 2013-01-30 21:35:41,549 Memtable.java (line 264) Writing
Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live bytes, 316 ops)
>>  INFO [FlushWriter:2636] 2013-01-30 21:35:41,563 Memtable.java (line 305) Completed
flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-266-Data.db
(93711 bytes) for commitlog position ReplayPosition(segmentId=1354832000013, position=20033362)
>>
>> Node 153:
>>
>>  INFO [GossipTasks:1] 2013-01-30 21:35:24,822 Gossiper.java (line 831) InetAddress
/10.0.0.152 is now dead.
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:24,896 GCInspector.java (line 122) GC for
ParNew: 294 ms for 1 collections, 1539602448 used; max is 3886022656
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:26,897 GCInspector.java (line 122) GC for
ParNew: 202 ms for 1 collections, 1829975768 used; max is 3886022656
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:27,898 GCInspector.java (line 122) GC for
ParNew: 234 ms for 1 collections, 2037767104 used; max is 3886022656
>>  INFO [MemoryMeter:1] 2013-01-30 21:35:31,151 Memtable.java (line 213) CFS(Keyspace='system',
ColumnFamily='HintsColumnFamily') liveRatio is 1.4195889775077013 (just-counted was 1.4195889775077013).
 calculation took 109ms for 64 columns
>>  INFO [MemoryMeter:1] 2013-01-30 21:35:37,806 Memtable.java (line 213) CFS(Keyspace='system',
ColumnFamily='HintsColumnFamily') liveRatio is 1.5276925191259334 (just-counted was 1.5276925191259334).
 calculation took 11ms for 129 columns
>>  INFO [GossipStage:1] 2013-01-30 21:35:40,735 Gossiper.java (line 817) InetAddress
/10.0.0.152 is now UP
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,742 HintedHandOffManager.java (line 296)
Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>>  INFO [MemoryMeter:1] 2013-01-30 21:35:43,184 Memtable.java (line 213) CFS(Keyspace='system',
ColumnFamily='HintsColumnFamily') liveRatio is 1.561312033901826 (just-counted was 1.561312033901826).
 calculation took 234ms for 144 columns
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:43,549 ColumnFamilyStore.java (line 659)
Enqueuing flush of Memtable-HintsColumnFamily@1120216043(96278/187900 serialized/live bytes,
288 ops)
>>
>> Node 154:
>>
>> INFO [HintedHandoff:1] 2013-01-30 21:35:00,729 HintedHandOffManager.java (line 392)
Finished hinted handoff of 0 rows to endpoint /10.0.0.153
>>  INFO [GossipTasks:1] 2013-01-30 21:35:26,432 Gossiper.java (line 831) InetAddress
/10.0.0.152 is now dead.
>>  INFO [GossipStage:1] 2013-01-30 21:35:40,669 Gossiper.java (line 817) InetAddress
/10.0.0.152 is now UP
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java (line 296)
Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,208 ColumnFamilyStore.java (line 659)
Enqueuing flush of Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live bytes,
286 ops)
>>  INFO [FlushWriter:2496] 2013-01-30 21:35:41,209 Memtable.java (line 264) Writing
Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live bytes, 286 ops)
>>
>>
>> Node 155:
>>
>> INFO [GossipTasks:1] 2013-01-30 21:35:27,117 Gossiper.java (line 831) InetAddress
/10.0.0.152 is now dead.
>>  INFO [GossipStage:1] 2013-01-30 21:35:40,668 Gossiper.java (line 817) InetAddress
/10.0.0.152 is now UP
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java (line 296)
Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,279 ColumnFamilyStore.java (line 659)
Enqueuing flush of Memtable-HintsColumnFamily@1083199446(94578/190654 serialized/live bytes,
316 ops)
>>
>> Meanwhile on node 152:
>>
>> INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line 305) Completed
flushing /raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-4175-Data.db
(639626 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=11032499)
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,701 GCInspector.java (line 122) GC for
ParNew: 35087 ms for 3 collections, 442940272 used; max is 3886022656
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,934 StatusLogger.java (line 57) Pool
Name                    Active   Pending   Blocked
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,941 StatusLogger.java (line 72) ReadStage
                       32       343         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) RequestResponseStage
             0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) ReadRepairStage
                  0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) MutationStage
                    0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) ReplicateOnWriteStage
            0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) GossipStage
                      0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) AntiEntropyStage
                 0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line 72) MigrationStage
                   0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line 72) StreamStage
                      0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) MemtablePostFlusher
              0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) FlushWriter
                      0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) MiscStage
                        0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) commitlog_archiver
               0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) InternalResponseStage
            0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) HintedHandoff
                    0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 77) CompactionManager
                0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 89) MessagingService
               n/a      0,24
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 99) Cache
Type                     Size                 Capacity               KeysToSave          
                                              Provider
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 100) KeyCache
                    133569                  2184533                      all
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 106) RowCache
                         0                        0                      all             
org.apache.cassandra.cache.SerializingCacheProvider
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 113) ColumnFamily
               Memtable ops,data
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 116) system.NodeIdInfo
                        0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.IndexInfo
                         0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.LocationInfo
                      0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.Versions
                        3,103
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.schema_keyspaces
                  0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,950 StatusLogger.java (line 116) system.Migrations
                        0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,059 StatusLogger.java (line 116) system.schema_columnfamilies
                0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.schema_columns
                    0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.HintsColumnFamily
              1,2369
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.Schema
                            0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_mailPipes
           9241,30380203
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_profilePics
           427,1173642
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_SBVars
                        0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_SBCounters
                  5,232
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,092 StatusLogger.java (line 116) SB_NNCD.CF_userPhoneAgenda
       7919,2332898
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_BSLUsers
                      0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_mailTimeline
                  0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_SBMessages
         54390,15054201
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_NewsFeedTimeline
       1866,7681861
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_BSLSpots
                    1,616
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBUniqueIds
                   0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBChecks
                      0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_NFTimelines
            423,533156
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBAppUsersProfilePic
                0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_SBLikes
                    16,565
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_SBCodes
                 154,93735
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_attachmentsPerUser
          80,107551
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_comments
                 58,77581
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_attachments
            641,247796
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_SBStats
                       0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_userReversedPhoneAgenda
       9047,7132129
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_ACD.CF_GSMembers
              823,1206779
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,113 StatusLogger.java (line 116) SB_ACD.CF_NSLoad
                         0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,114 StatusLogger.java (line 116) SB_ACD.CF_sessions
                       0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,115 StatusLogger.java (line 116) SB_ACD.CF_userGS
                  193,577229
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line 116) SB_ACD.CF_userApps
                     8,170
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line 116) SB_ACD.CF_GSData
                    126,3503
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,117 StatusLogger.java (line 116) SB_ACD.CF_R2Notifications
         1906,58481
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,118 StatusLogger.java (line 116) SB_ACD.CF_uniqueKeys
                     0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,155 StatusLogger.java (line 116) SB_ACD.CF_userContactWannabes
         953,438671
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,157 StatusLogger.java (line 116) SB_ACD.CF_SBApps
                         0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,158 StatusLogger.java (line 116) SB_ACD.CF_SBAppUsers
                     0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line 116) SB_ACD.CF_R1Notifications
       1354,2070513
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line 116) SB_ACD.CF_SBVotes
                 463,230386
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,161 StatusLogger.java (line 116) SB_ACD.CF_userContacts
          2968,4849519
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line 116) SB_ACD.CF_sessionsSuspended
                0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line 116) SB_ACD.CF_userBlockedList
              5,207
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,172 StatusLogger.java (line 116) SB_ACD.CF_users
               65386,10852776
>> WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 GCInspector.java (line 145) Heap
is 0.7514124302624755 full.  You may need to reduce memtable and/or cache sizes.  Cassandra
will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at
threshold in cassandra.yaml if you don't want Cassandra to do this automatically
>>  WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 StorageService.java (line 2880)
Flushing CFS(Keyspace='SB_NNCD', ColumnFamily='CF_mailPipes') to relieve memory pressure
>>  INFO [ScheduledTasks:1] 2013-01-30 21:37:05,896 ColumnFamilyStore.java (line 659)
Enqueuing flush of Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live bytes,
9380 ops)
>>  INFO [FlushWriter:13] 2013-01-30 21:37:05,909 Memtable.java (line 264) Writing Memtable-CF_mailPipes@192792072(6868296/30790355
serialized/live bytes, 9380 ops)
>>  INFO [FlushWriter:13] 2013-01-30 21:37:06,385 Memtable.java (line 305) Completed
flushing /raid0/cassandra/data/SB_NNCD/CF_mailPipes/SB_NNCD-CF_mailPipes-hf-2165-Data.db (2049401
bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=13578602)
>>  WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202) setting live
ratio to maximum of 64.0 instead of 751.6512549537648
>>  INFO [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 213) CFS(Keyspace='SB_ACD',
ColumnFamily='CF_R2Notifications') liveRatio is 64.0 (just-counted was 64.0).  calculation
took 61ms for 8 columns
>>  INFO [MemoryMeter:1] 2013-01-30 21:40:47,274 Memtable.java (line 213) CFS(Keyspace='SB_ACD',
ColumnFamily='CF_userGS') liveRatio is 5.309650177091864 (just-counted was 5.309650177091864).
 calculation took 11ms for 170 columns
>>  INFO [HintedHandoff:1] 2013-01-30 21:42:03,709 HintedHandOffManager.java (line 296)
Started hinted handoff for token: 85070591730234615865843651857942052864 with IP: /10.0.0.153
>>  INFO [HintedHandoff:1] 2013-01-30 21:42:03,712 ColumnFamilyStore.java (line 659)
Enqueuing flush of Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live bytes, 2
ops)
>>  INFO [FlushWriter:14] 2013-01-30 21:42:03,713 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1676047624(950/2369
serialized/live bytes, 2 ops)
>>  INFO [FlushWriter:14] 2013-01-30 21:42:03,730 Memtable.java (line 305) Completed
flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db
(1016 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=17323736)
>>  INFO [CompactionExecutor:48] 2013-01-30 21:42:03,731 CompactionTask.java (line 109)
Compacting [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db'),
SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-121-Data.db')]
>>
>> The cluster is perfectly balanced:
>>
>> Note: Ownership information does not include topology, please specify a keyspace.
>> Address         DC          Rack        Status State   Load            Owns     
          Token
>>                                                                                 
          141784319550391032739561396922763706368
>> 10.0.0.150      datacenter1 rack1       Up     Normal  13.25 GB        16.67%   
          0
>> 10.0.0.151      datacenter1 rack1       Up     Normal  10.31 GB        16.67%   
          28356863910078203714492389662765613056
>> 10.0.0.152      datacenter1 rack1       Up     Normal  8.58 GB         16.67%   
          56713727820156407428984779325531226112
>> 10.0.0.153      datacenter1 rack1       Up     Normal  8.1 GB          16.67%   
          85070591730234615865843651857942052864
>> 10.0.0.154      datacenter1 rack1       Up     Normal  8.28 GB         16.67%   
          113427455640312814857969558651062452224
>> 10.0.0.155      datacenter1 rack1       Up     Normal  9.38 GB         16.67%   
          141784319550391032739561396922763706368
>>
>>
>> I guess something happens on 152 that makes other nodes believe it is dead, then
does something to fix itself and comes back after (messing up performance in the meanwhile)
.Does this makes any sense? Any clue of what might be going on? I can post more logs if you
guys need
>>
>> Thanks again!!
>>
>> Guillermo
>>
>>
>>
>> 2013/1/30 Nate McCall <zznate.m@gmail.com>
>>>
>>> What's the output of nodetool cfstats for those 2 column families on
>>> cassNode2 and cassNode3? And what is the replication factor for this
>>> cluster?
>>>
>>> Per the previous reply, nodetool ring should show each of your nodes
>>> with ~16.7% of the data if well balanced.
>>>
>>> Also, the auto-detection for memory sizes in the startup script is a
>>> little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of
>>> ram. It usually ends up allocating 4g/400m (max and young) whereas
>>> 8g/800m will give you some more breathing room.
>>>
>>> On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot <btalbot@aeriagames.com>
wrote:
>>> > My guess is that those one or two nodes with the gc pressure also have more
>>> > rows in your big CF.  More rows could be due to imbalanced distribution
if
>>> > your'e not using a random partitioner or from those nodes not yet removing
>>> > deleted rows which other nodes may have done.
>>> >
>>> > JVM heap space is used for a few things which scale with key count
>>> > including:
>>> > - bloom filter (for C* < 1.2)
>>> > - index samples
>>> >
>>> > Other space is used but can be more easily controlled by tuning for
>>> > - memtable
>>> > - compaction
>>> > - key cache
>>> > - row cache
>>> >
>>> >
>>> > So, if those nodes have more rows (check using "nodetool ring" or "nodetool
>>> > cfstats") than the others you can try to:
>>> > - reduce the number of rows by adding nodes, run manual / tune compactions
>>> > to remove rows with expired tombstones, etc.
>>> > - increase bloom filter fp chance
>>> > - increase jvm heap size (don't go too big)
>>> > - disable key or row cache
>>> > - increase index sample interval
>>> >
>>> > Not all of those things are generally good especially to the extreme so
>>> > don't go setting a 20 GB jvm heap without understanding the consequences
for
>>> > example.
>>> >
>>> > -Bryan
>>> >
>>> >
>>> > On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero
>>> > <guillermo.barbero@spotbros.com> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >>   I'm viewing a weird behaviour in my cassandra cluster. Most of the
>>> >> warning messages are due to Heap is % full. According to this link
>>> >>
>>> >> (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html)
>>> >> there are two ways to "reduce pressure":
>>> >> 1. Decrease the cache sizes
>>> >> 2. Increase the index interval size
>>> >>
>>> >> Most of the flushes are in two column families (users and messages),
I
>>> >> guess that's because the most mutations are there.
>>> >>
>>> >> I still have not applied those changes to the production environment.
>>> >> Do you recommend any other meassure? Should I set specific tunning for
>>> >> these two CFs? Should I check another metric?
>>> >>
>>> >> Additionally, the distribution of warning messages is not uniform
>>> >> along the cluster. Why could cassandra be doing this? What should I
do
>>> >> to find out how to fix this?
>>> >>
>>> >> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
>>> >> the java version is the following:
>>> >> java version "1.6.0_37"
>>> >> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
>>> >> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>>> >>
>>> >> The cassandra system.log is resumed here (numer of messages, cassandra
>>> >> node, class that reports the message, first word of the message)
>>> >> 2013-01-26
>>> >>       5 cassNode0: GCInspector.java Heap
>>> >>       5 cassNode0: StorageService.java Flushing
>>> >>     232 cassNode2: GCInspector.java Heap
>>> >>     232 cassNode2: StorageService.java Flushing
>>> >>     104 cassNode3: GCInspector.java Heap
>>> >>     104 cassNode3: StorageService.java Flushing
>>> >>       3 cassNode4: GCInspector.java Heap
>>> >>       3 cassNode4: StorageService.java Flushing
>>> >>       3 cassNode5: GCInspector.java Heap
>>> >>       3 cassNode5: StorageService.java Flushing
>>> >>
>>> >> 2013-01-27
>>> >>       2 cassNode0: GCInspector.java Heap
>>> >>       2 cassNode0: StorageService.java Flushing
>>> >>       3 cassNode1: GCInspector.java Heap
>>> >>       3 cassNode1: StorageService.java Flushing
>>> >>     189 cassNode2: GCInspector.java Heap
>>> >>     189 cassNode2: StorageService.java Flushing
>>> >>     104 cassNode3: GCInspector.java Heap
>>> >>     104 cassNode3: StorageService.java Flushing
>>> >>       1 cassNode4: GCInspector.java Heap
>>> >>       1 cassNode4: StorageService.java Flushing
>>> >>       1 cassNode5: GCInspector.java Heap
>>> >>       1 cassNode5: StorageService.java Flushing
>>> >>
>>> >> 2013-01-28
>>> >>       2 cassNode0: GCInspector.java Heap
>>> >>       2 cassNode0: StorageService.java Flushing
>>> >>       1 cassNode1: GCInspector.java Heap
>>> >>       1 cassNode1: StorageService.java Flushing
>>> >>       1 cassNode2: AutoSavingCache.java Reducing
>>> >>     343 cassNode2: GCInspector.java Heap
>>> >>     342 cassNode2: StorageService.java Flushing
>>> >>     181 cassNode3: GCInspector.java Heap
>>> >>     181 cassNode3: StorageService.java Flushing
>>> >>       4 cassNode4: GCInspector.java Heap
>>> >>       4 cassNode4: StorageService.java Flushing
>>> >>       3 cassNode5: GCInspector.java Heap
>>> >>       3 cassNode5: StorageService.java Flushing
>>> >>
>>> >> 2013-01-29
>>> >>       2 cassNode0: GCInspector.java Heap
>>> >>       2 cassNode0: StorageService.java Flushing
>>> >>       3 cassNode1: GCInspector.java Heap
>>> >>       3 cassNode1: StorageService.java Flushing
>>> >>     156 cassNode2: GCInspector.java Heap
>>> >>     156 cassNode2: StorageService.java Flushing
>>> >>      71 cassNode3: GCInspector.java Heap
>>> >>      71 cassNode3: StorageService.java Flushing
>>> >>       2 cassNode4: GCInspector.java Heap
>>> >>       2 cassNode4: StorageService.java Flushing
>>> >>       2 cassNode5: GCInspector.java Heap
>>> >>       1 cassNode5: Memtable.java setting
>>> >>       2 cassNode5: StorageService.java Flushing
>>> >>
>>> >> --
>>> >>
>>> >> Guillermo Barbero - Backend Team
>>> >>
>>> >> Spotbros Technologies
>>> >
>>> >
>>> >
>>
>>
>>
>>
>> --
>>
>> Guillermo Barbero - Backend Team
>>
>> Spotbros Technologies
>>
>>
>>
>
>
>
> --
>
> Guillermo Barbero - Backend Team
>
> Spotbros Technologies

Mime
View raw message